i GPU GPU GPU GPU CPU Radeon X800 Pro 3.2 α

Similar documents
1 GPU GPGPU GPU CPU 2 GPU 2007 NVIDIA GPGPU CUDA[3] GPGPU CUDA GPGPU CUDA GPGPU GPU GPU GPU Graphics Processing Unit LSI LSI CPU ( ) DRAM GPU LSI GPU

1 3DCG [2] 3DCG CG 3DCG [3] 3DCG 3 3 API 2 3DCG 3 (1) Saito [4] (a) 1920x1080 (b) 1280x720 (c) 640x360 (d) 320x G-Buffer Decaudin[5] G-Buffer D

GPGPU

,4) 1 P% P%P=2.5 5%!%! (1) = (2) l l Figure 1 A compilation flow of the proposing sampling based architecture simulation

07-二村幸孝・出口大輔.indd

23 Fig. 2: hwmodulev2 3. Reconfigurable HPC 3.1 hw/sw hw/sw hw/sw FPGA PC FPGA PC FPGA HPC FPGA FPGA hw/sw hw/sw hw- Module FPGA hwmodule hw/sw FPGA h

OpenGL GLSL References Kageyama (Kobe Univ.) Visualization / 58

IPSJ SIG Technical Report Vol.2012-CG-148 No /8/29 3DCG 1,a) On rigid body animation taking into account the 3D computer graphics came

main.dvi

1 OpenCL OpenCL 1 OpenCL GPU ( ) 1 OpenCL Compute Units Elements OpenCL OpenCL SPMD (Single-Program, Multiple-Data) SPMD OpenCL work-item work-group N

Microsoft PowerPoint - GPU_computing_2013_01.pptx

WebGL OpenGL GLSL Kageyama (Kobe Univ.) Visualization / 57

[2] OCR [3], [4] [5] [6] [4], [7] [8], [9] 1 [10] Fig. 1 Current arrangement and size of ruby. 2 Fig. 2 Typography combined with printing

& Vol.5 No (Oct. 2015) TV 1,2,a) , Augmented TV TV AR Augmented Reality 3DCG TV Estimation of TV Screen Position and Ro

2017 (413812)

4.1 % 7.5 %

1 Table 1: Identification by color of voxel Voxel Mode of expression Nothing Other 1 Orange 2 Blue 3 Yellow 4 SSL Humanoid SSL-Vision 3 3 [, 21] 8 325

IPSJ SIG Technical Report Vol.2013-ARC-203 No /2/1 SMYLE OpenCL (NEDO) IT FPGA SMYLEref SMYLE OpenCL SMYLE OpenCL FPGA 1

IPSJ SIG Technical Report iphone iphone,,., OpenGl ES 2.0 GLSL(OpenGL Shading Language), iphone GPGPU(General-Purpose Computing on Graphics Proc

Presentation

17 Proposal of an Algorithm of Image Extraction and Research on Improvement of a Man-machine Interface of Food Intake Measuring System

IPSJ SIG Technical Report Vol.2017-ARC-225 No.12 Vol.2017-SLDM-179 No.12 Vol.2017-EMB-44 No /3/9 1 1 RTOS DefensiveZone DefensiveZone MPU RTOS

Input image Initialize variables Loop for period of oscillation Update height map Make shade image Change property of image Output image Change time L


卒業論文2.dvi

untitled

16.16%

10-渡部芳栄.indd

258 5) GPS 1 GPS 6) GPS DP 7) 8) 10) GPS GPS ) GPS Global Positioning System

修士論文

IPSJ SIG Technical Report Vol.2012-IS-119 No /3/ Web A Multi-story e-picture Book with the Degree-of-interest Extraction Function

Studies of Foot Form for Footwear Design (Part 9) : Characteristics of the Foot Form of Young and Elder Women Based on their Sizes of Ball Joint Girth

P2P P2P peer peer P2P peer P2P peer P2P i

IPSJ SIG Technical Report Vol.2011-EC-19 No /3/ ,.,., Peg-Scope Viewer,,.,,,,. Utilization of Watching Logs for Support of Multi-

Abstract This paper concerns with a method of dynamic image cognition. Our image cognition method has two distinguished features. One is that the imag

2). 3) 4) 1.2 NICTNICT DCRA Dihedral Corner Reflector micro-arraysdcra DCRA DCRA DCRA 3D DCRA PC USB PC PC ON / OFF Velleman K8055 K8055 K8055

単位、情報量、デジタルデータ、CPUと高速化 ~ICT用語集~

untitled

Sobel Canny i

2

2

untitled

supercomputer2010.ppt

CPU Levels in the memory hierarchy Level 1 Level 2... Increasing distance from the CPU in access time Level n Size of the memory at each level 1: 2.2

2. CABAC CABAC CABAC 1 1 CABAC Figure 1 Overview of CABAC 2 DCT 2 0/ /1 CABAC [3] 3. 2 値化部 コンテキスト計算部 2 値算術符号化部 CABAC CABAC

Core1 FabScalar VerilogHDL Cache Cache FabScalar 1 CoreConnect[2] Wishbone[3] AMBA[4] AMBA 1 AMBA ARM L2 AMBA2.0 AMBA2.0 FabScalar AHB APB AHB AMBA2.0

28 Horizontal angle correction using straight line detection in an equirectangular image

fiš„v8.dvi

IPSJ SIG Technical Report Vol.2009-DPS-141 No.20 Vol.2009-GN-73 No.20 Vol.2009-EIP-46 No /11/27 1. MIERUKEN 1 2 MIERUKEN MIERUKEN MIERUKEN: Spe

1

Web Web Web Web Web, i

,,,,., C Java,,.,,.,., ,,.,, i

2

LAN LAN LAN LAN LAN LAN,, i


IPSJ SIG Technical Report Vol.2010-MPS-77 No /3/5 VR SIFT Virtual View Generation in Hallway of Cybercity Buildings from Video Sequen

第62巻 第1号 平成24年4月/石こうを用いた木材ペレット

2

HP xw9400 Workstation


SIG-Challenge.dvi

FabHetero FabHetero FabHetero FabCache FabCache SPEC2000INT IPC FabCache 0.076%

<95DB8C9288E397C389C88A E696E6462>

[2] 2. [3 5] 3D [6 8] Morishima [9] N n 24 24FPS k k = 1, 2,..., N i i = 1, 2,..., n Algorithm 1 N io user-specified number of inbetween omis

,,.,,., II,,,.,,.,.,,,.,,,.,, II i

ハイディフィニション(高精細)ビデオの理解と使用方法

21 e-learning Development of Real-time Learner Detection System for e-learning

19_22_26R9000操作編ブック.indb

An Interactive Visualization System of Human Network for Multi-User Hiroki Akehata 11N F


2009 3DCG : M DCG,,,, 3DCG 2D 3DCG 2D 3DCG 3DCG

B

14 2 5

ストリーミング SIMD 拡張命令2 (SSE2) を使用した、倍精度浮動小数点ベクトルの最大/最小要素とそのインデックスの検出

02_Matrox Frame Grabbers_1612

IPSJ SIG Technical Report Vol.2010-GN-74 No /1/ , 3 Disaster Training Supporting System Based on Electronic Triage HIROAKI KOJIMA, 1 KU

ID 3) 9 4) 5) ID 2 ID 2 ID 2 Bluetooth ID 2 SRCid1 DSTid2 2 id1 id2 ID SRC DST SRC 2 2 ID 2 2 QR 6) 8) 6) QR QR QR QR

Vol.54 No (July 2013) [9] [10] [11] [12], [13] 1 Fig. 1 Flowchart of the proposed system. c 2013 Information


, : GUI Web Java 2.1 GUI GUI GUI 2 y = x y = x y = x


7,, i

1 Fig. 1 Extraction of motion,.,,, 4,,, 3., 1, 2. 2.,. CHLAC,. 2.1,. (256 ).,., CHLAC. CHLAC, HLAC. 2.3 (HLAC ) r,.,. HLAC. N. 2 HLAC Fig. 2

1_26.dvi

MAC root Linux 1 OS Linux 2.6 Linux Security Modules LSM [1] Security-Enhanced Linux SELinux [2] AppArmor[3] OS OS OS LSM LSM Performance Monitor LSMP

Jan, 2004 Plenary Meeting ARIB 5 Jan, 2005 Plenary Meeting x86 BML Browser on DirectFB Jan, 2006 Technical Jamboree ARIB Extension for DirectFB 2

1重谷.PDF

第3章 OpenGL の基礎

3_23.dvi


H8000操作編

_CS6.indd

soturon.dvi

A Study on Practical Use of Artificial Intelligence. The purpose of this research paper is to demonstrate the ease of using artificial intelligence in

1 [1, 2, 3, 4, 5, 8, 9, 10, 12, 15] The Boston Public Schools system, BPS (Deferred Acceptance system, DA) (Top Trading Cycles system, TTC) cf. [13] [

indd


kiyo5_1-masuzawa.indd

56 OS OS OS OS 1 OS HDD OS 1 OS HDD HDD OS OS OSOS HDD 図 1 二重キャッシュ環境 3. 負の参照の時間的局所性 3.1 参照の局所性 Locality of Reference Temporal locality Spatial localit

AMD/ATI Radeon HD 5870 GPU DEGIMA LINPACK HD 5870 GPU DEGIMA LINPACK GFlops/Watt GFlops/Watt Abstract GPU Computing has lately attracted

Transcription:

GPU 18 2 3

i GPU GPU GPU GPU CPU Radeon X800 Pro 3.2 α

Studies on the Speedup of Volume Rendering with Commodity GPUs Abstract ii Yuki SHINOMOTO A slice-order texture-based algorithm for volume rendering with commmodity GPUs loses spacial locality of reference and suffers from low cache hit ratio at some viewpoints. This is because the access pattern of volume references depends on the position of the viewpoint. A cuboid-order ray-casting algorithm which maximizes spatial locality of reference has been proposed. The cuboid-order algorithm divides the volume into sub volumes named cuboid, and controls the access pattern by rendering each cuboid. Maximization is achieved by detecting and sampling points in a cuboid fetched into the cache memory, before the cache lines composing the cuboid are replaced. In this paper we propose a cuboid-order texture-based algorithm based on the cuboid-order ray-casting algorithm. CPUs cannot control the access pattern as easily as CPUs. Our algorithm controls the aceess pattern by dividing the slice into smaller slices and arranges them in the cuboid-order when rendering each cuboid. The number of slices increases in proportion to the number of cuboids. We evaluate our algorithm with Radeon X800 Pro. The result shows that performance of the cuboid-order algorithm is lower than that of the slice-order when the size of cuboids are smaller than that of texture-cache and 3.2 times higer at some size of cuboids. The performance of the cuboid-order algorithm suffers from the cost of processing vertices of slices and changing cuboids which are processed. In this paper we proposed the address transformation of the volume and blending with the fragment prosessor of the GPU. The former reduces the cost of chaing cuboids and the latter reduces the number of vertices.

GPU 1 1 2 3 2.1 Volume Rendering................................ 3 2.1.1........................ 4 2.2 GPU...................................... 5 2.2.1 GPU............................... 6 2.2.2.................... 7 2.2.3 GPU.................. 8 2.2.4...................... 9 2.2.5 GPU....................... 10 2.2.6 GPU............................ 11 2.3 Texture Based Volume Rendering..................... 12 2.3.1......................... 12 2.3.2............ 18 2.4.................. 19 2.5 2.................................. 20 3 22 3.1............................... 23 3.1.1......................... 23 3.1.2........................ 23 3.2.................................. 24 3.2.1................................ 25 3.3............................. 26 3.3.1............................ 26 3.3.2.................... 27 3.4............................... 29 3.5.................................... 29

3.5.1.............................. 30 3.6 3.................................. 31 4 32 4.1....................................... 32 4.2 1.............. 33 4.3 2.................... 35 4.4............... 36 5 39 5.1............. 39 5.2............... 39 5.2.1.............................. 40 5.2.2........................... 42 5.3 GPU............................. 44 6 45 46 47

1 3 2 / [1] / 4K 3 [2] [3] 6.5TFLOPS 64GB GPU GPU GPU GPU GPU GPU 1

CPU CPU [4, 5] DRAM CPU GPU GPU GPU CPU GPU 2 GPU 3 4 5 2

2 GPU 2.1 Volume Rendering 3, 2,. 2.,., 3,,.,, CPU GPU PC B Ray A I(A, B) Volume 1: 3 3

1 1 1 A B I(A, B) I(A, B) = B A g(s)e s A τ(x)dx ds (1) s, x g τ g p g(p) p 1 g 2 g 1 I(A, B) = B i=a g(s i )e s j j=a τ(x j) (2) 2.1.1.,,, ( 2). 4

(front to back), (back to front). back to front,, v 0, v 1,, v n, RGB( ) c k α k v k ( ), C C = n i=0 i 1 α(v i )c(v i ) (1 α(v j )) (3). C k. j=0 C k 1 = α(v k 1 )c(v k 1 ) + (1 α(v k 1 ))C k (4) C = C 0. (2), RGB α. α RGB, α, RGB. 2: 2.2 GPU GPU GPU 3 5

GPU GPU GPU NVIDIA GeForce ATI Radeon Matrox Parhelia 2.2.1 GPU GPU (Vertex Processor) RGBα (Triangle Setup Engine) 3 (rasterizer) GPU (Fragment Processor) (texture unit) (Texture) 1 6

(Texture Cache) 2 GPU[6] 1 2 2 CPU CPU (Pixel Unit) (Raster Unit) α (Frame Buffer) RAMDAC (Random Access Memory Digital-to-Analog Converter) (Video Memory) GDDR GDDR DRAM GPU GPU 512MB 2.2.2 7

GPU 3 GPU Matrices,light positions,blend factors, and other uniform parameters GPU/ application Programabble Vertex Processor Primitive Assenmbly Rasterization & Interpretation Programabble Fragment Proceddor Frame-Buffer Tests & Blending Vertex Indeces Textures Frame Buffer data Flows:Primitive,Vertex,and Fragment Data Uniform Parameters-Change infrequently 3: CPU GPU / CPU GPU GPU CPU CPU CPU 3 RGBα α 2.2.3 GPU GPU API API (Graphics API) CPU GPU API GPU API Windows 8

DirectX[7] OS OpenGL[8] API GPU OpenGL glcolor3f(1.0, 1.0, 1.0); glbegin(gl_quads); glvertex3f(-1.0, -1.0, 0.0); glvertex3f( 1.0, -1.0, 0.0); glvertex3f( 1.0, 1.0, 0.0); glvertex3f( -1.0, 1.0, 0.0); glend(); API GPU 2 CPU-GPU GPU 2.2.4 API Cg (C for Graphics) HLSL (High Level Shader Language), GLSL 9

(OpenGL Shading Language) GPU NVIDIA microsoft OpenGL ARB CPU API 2.2.5 GPU GPU GPU RGBα 1 1 RGBα RRRR BGG - GPU GPU 2 GPU RGBα 10

2.2.6 GPU GPU API DirectX DirectX 7 GPU CG,Hardware T&L (Hardware Transformation and Lighting) DirectX 8 GPU CG 2 (vertex program) (fragment program) DirectX 9 GPU DirectX 9c GPU GPU GPU GPU GPU GPU GPGPU (General-Purpose computation on GPUs) [9] 11

2.3 Texture Based Volume Rendering GPU [10, 11] 2.3.1 1. 2. 3. 4. 5., α CPU GPU API, α ( 4)., GPU [12]. 4: 12

GPU GPU 3 3 ( 5). CPU GPU API 3 5: 3 GPU 3 2 2D GPU 3 3 3 1 1 1 Z Z Z 13

1 1 GPU CPU CPU GPU GPU GPU 6 GPU ( 6) 14

6: [0,0, 1.0] (0.5, 0.5, 0.5) 7 100 100 [0, 1] GPU Z 15

(-50.0, 50.0) (50.0, 50.0) (0.0,1.0) (1.0,1.0) (-50.0, -50.0) (50.0, -50.0) (0.0,0.0) (0.0,1.0) 7: Z RGBα GPU CPU 3 GPU 3 3 4 GPU GPU GPU GPU Dependent Texture ( ) [13] 2 3 RGBα 1 16

0 1 3 1 ( 8) 0 2 3... 244 255 244 R 0 200 50 20 G 0 64 B 0 185 A 0 244 8: 3 2 3 1 GPU 17

α 2.3.2 1 2 4 1/8 GPU CPU RC [4] GPU α CPU GPU 18

2 3 CG 2 GPU 2.4 CPU [14, 15] ( 9) 1 1 DRAM 6 1.15 [5] 19

1 2 5 3 6 9 4 7 10 13 8 11 14 12 15 16 9: CPU GPU [4] Itanium2 128 3 9FPS 4 Radeon X800 Pro 512 3 1.8FPS Radeon X800 Pro Itanium2 12.8 GPU CPU View 2.5 2 GPU GPU GPU GPU 4 1/8 CPU 20

21

3 GPU (Cuboid: ) GPU GPU CPU CPU 22

GPU GPU CPU 4 GPU GPU 3.1 GPU GPU CPU 3.1.1 GPU 2 2 GPU 2 2 2 3.1.2 CPU ID 23

CPU ID GPU ID 3.2 GPU CPU 2 CPU CPU API GPU x, y, z int x, y, z; void loop_x(void){ for(x = 0; x < cx; ++x) loop_y(); for(x = X_MAX; x > cx; --x) loop_y(); x = cx; loop_y(); } X MAX x 1 10 2 cx x x 10 2 cy cy > 3 loop x x loop x x < cx, x > cx, x = cx 3 x < cx 0 cx - 1 0, 1, 2,, cx - 1 24

0 1 2 3 x 0 1-1 2-1 4-1 3-1 1 1-2 2-2 4-2 3-2 2 1-3 2-3 4-3 3-3 3 1-4 2-4 4-4 3-4 y # :cuboid 10: x > cx X MAX cx + 1 x = cx 0, 1, 3, 2 loop y y 0, 1, 2, 3 cx > X MAX cx < 0 loop y() 10 1-1, 1-2, 4-4 3.2.1 x, y xy y yx x 10 1-1, 2-1,, 4-4 10 xy 25

3.3 GPU 11: 3.3.1 ( 12) 26

GPU 6 8 GPU 1-1 1-1 1-1 1-2 12: 3.3.2 27

CPU ( 13) 13: 28

3.4 3.5 n 3 O(n 2 ) n 3 n 2 n 3 29

3.5.1 1 2 ( 14) GPU 1 2 [9] GPU 3/4 GPU 1/2 14: 30

3.6 3 GPU GPU n 3 n 2 31

4 2 2 GPU 1 3 2 4.1 CPU Intel Pentium 4 2.5GHZ GPU ATI Radeon X800 Pro Linux 2.6.8 C, OpenGL 1.5 Cg (C for graphics) Radeon X800 Pro 1 1: Radeon X800 Pro 475 MHz 28.8 GB/sec. 5.7 GPixels/sec. 712.5 MTriangles Memory 256 MB Memory Interface (bit) 256 Memory Data Rate 900MHz Pixels per Clock (peak) 12 32

4.2 1 GPU 1 1 512 512, 512 512 3 (128MB) 8 3 (512Byte) ALPHA 8 1Byte α 2 α RGBα X 0 360 15 Y 0 360 16 (FPS) 15 16 8 3 Radeon X800 Pro 4KB 512Byte 33

FPS 14 12 10 8 6 4 2 0 0 45 90 135 180 225 270 315 360 angle 15: (X ) 512 3 256 3 128 3 64 3 16 3 8 3 FPS 14 12 10 8 6 4 2 512 3 256 3 128 3 64 3 16 3 8 3 0 0 45 90 135 180 225 270 315 360 angle 16: (Y ) 34

CPU DRAM DRAM GPU DRAM 4.3 2 1 1 1 1 512 3 Byte 1 512 3 16 3 X 0 180 2 (FPS) 3 n O(n 2 ) 2 O(1/n 2 ) FPS FPS 45 0 2 FPS 1/ 2 35

2: 512 3 256 3 128 3 64 3 32 3 16 3 0 1316.7 386.3 131.5 41.8 10.7 2.0 45 924.4 255.7 85.3 25.6 6.5 1.7 90 1310.3 383.7 130.6 41.6 10.6 2.0 135 922.2 255.5 85.3 25.6 6.5 1.7 180 1304.3 383.6 130.6 41.6 10.6 2.0 FPS n O(n 3 ) 32 3 FPS 10.7 6.5 4.4 1 8 3 2 32 3 512 3 512 2 1 (512/ 1 ) 3 X 0 360 17 Y 0 360 18 36

8 3 GPU FPS 14 12 10 8 6 4 2 512 3 256 3 128 3 64 3 32 3 16 3 0 0 45 90 135 180 225 270 315 360 angle 17: (X ) FPS 14 12 10 8 6 4 2 512 3 256 3 128 3 64 3 32 3 16 3 0 0 45 90 135 180 225 270 315 360 angle 18: (Y ) 37

512 3 64 3 512 3 13.9FPS 512 3 1.8FPS 64 3 5.3FPS 32 3 10.5FPS 5.9FPS 16 3 1 1 8 3 GPU 64 3 = 262, 144 API GPU 32 3 2 32 3 17, 18 1.8FPS 32 3 5.7FPS 13.9FPS 10.5FPS 8 3 38

5 4 4 2 5.1 2 1 1 F P S Vertices 0 3 3: 1 2 3 4 3 8 3 16 3 32 3 2,022,451 2,373,427 3,231,744 4,109,107 4,207,411 3,145,728 4.2 MVertices 1 1/100 5.2 39

5.2.1 1 GPU 3 2 GPU 2 2 2 n 3 n 3/2 n 3/2 2 8 3 16 16 GPU 3 2 2 [16] 3D 1D 1D 2D 3D 2D 1 2 Cg float2 addrtranslation_1dto2d( float address1d, float2 texsize ) { // float2 CONV_CONST = float2( 1.0 / texsize.x, 1.0 / (texsize.x * texsize.y )); float2 normaddr2d = address1d * CONV_CONST; 40

float2 address2d = float2( frac(normaddr2d.x), normaddr2d,y ); return address2d; } frac(normaddr2d.x) normaddr2d.x texsize 2 address1d 1 3 1 Cg float2 addrtranslation_3dto2d(float3 address3d, float3 sizetex3d, float2 sizetex2d) { // float3 SIZE_CONST = float3(1.0, sizetex3d.x, sizetex3d.y * sizetex3d.x); float address1d = dot( address3d, SIZE_CONST); return addrtranslation_1dto2d( address1d, sizetex2d); } dot(address3d, SIZEl CONST) address3d SIZE CONST 3D 1D GPU 2 3 2 3 2D 2 1 3 2 3 ( 19) 3 GPU 2 4096 4096 2 [5] 41

2 1 4 2 3 1 3 4 5 6 8 6 7 5 8 7 19: [0:1.0] GPU 5.2.2 2 α 2 α α 42

Cg lerp() α α α GPU GPU GPU GPU GPU [17] 8 3 8 24 8 GPU RC 43

GPU GPU 5.3 GPU GPU GPU GPU CPU GPU GPU CPU GPU CPU GPU CPU [13] CPU GPU CPU GPU CPU GPU GPU GPU CPU API 5.2.1 GPU 44

6 GPU 3 1 α GPU GPU GPU GPU GPU 45

46

[1] Lichtenbelt, B., Crane, R. and Naqvi, S.: Introduction To Volume Rendering, Hewlett-Packard (1998). [2], : X TV SHIN- MAVISION ELNOS, Medical Now, Vol. 44, pp. 14 15 (2000). [3], : SHD - -, (SPWS-TMWG) 005 (1999). [4],,,, :,, Vol. 44, No. SIG 11(ACS 3), pp. 137 146 (2003). [5],,,, :,, Vol. 45, No. SIG 11(ACS 7), pp. 356 367 (2004). [6] Montrym, J. and Moreton, H.: The GeForce 6800, IEEE Micro, Vol. 25, No. 2, pp. 41 51 (2005). [7] DirectX: http://www.microsoft.com/windows/directx/. [8] OpenGL: http://www.opengl.org/. [9] GPGPU: http://www.gpgpu.org/. [10] Muraki, S., Lum, E. B., Ma, K.-L., Ogata, M. and Liu, X.: A PC Cluster System for Simultaneous Interactive Volumetric Modeling and Visualization, IEEE Symposium on Parallel and Large-Data Visualization and Graphics, pp. 95 102 (2003). [11],, : PC, CVIMl-130-10 (2001). [12] Rezk-Salama, C., K. Engel, M. B., Greiner, G. and Ertl, T.: Interactive Volume Rendering on Standard PC Graphics Hardware Using Multi-Texturesand Multi-Stage Rasterization, Proceedings of Eurograph- 47

ics/siggraph Workshop on Graphics Hardware (2000). [13],,,,, :, 2005 ARC 164 (SWoPP 2005), pp. 145 150 (2005). [14] Wolfe, M.: More Iteration Space Tiling, Proceedings of Supercomputing (SC 89), pp. 655 664 (1989). [15] Lam, M. S., Tothberg, E. E. and Wolf, M. E.: The Cache Performance and Optimizations of Blocked Algorithms, In Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 63 74 (1991). [16] Pharr, M. and Fernando, R.(eds.): GPU Gems 2: Programming Techniques For High-Performance Graphics And General-Purpose Computation, Addison-Wesley Pub (2005). [17] Stegmaier, S., Strengert, M., Klein, T. and Ertl, T.: A Simple and Flexible Volume Rendering Framework for Graphics-Hardware-based Raycasting, Proceedings of Eurographics/IEEE VGTC Workshop on Volume Graphics 2005, pp. 187 195 (2005). 48