main.dvi

Size: px

Start display at page:

Download "main.dvi"

こおがはかまや
9 years ago
Views:

1 PC 1 1 [1][2] [3][4] ( ) GPU(Graphics Processing Unit) GPU PC GPU PC ( 2 GPU ) GPU Harris Corner Detector[5] CPU ( ) ( ) CPU GPU 2 3 GPU [email protected] 45

2 2 ( ) CPU ( ) ( ) () 2.1 ( 1) ( ) () CPU SIMD PC (CPU) SIMD (Intel :SSE4 AMD :SSE5 ) [6] CPU 4 32bit 46

3 input () output 1: SMP(Symmetric Multi Processing) CPU (Intel :Core2 AMD :Phenom ) SMP SMP OpenMP[7] CPU ( ) SMP 16 (= ) PC MPI[8] (Ethernet ) 3 GPU(Graphics Processing Unit) GPU 3 PC. GPU NVIDIA (GeForce ) AMD(ATI) (Radeon ). GPU 47

4 PC 1 GPU 1: GPU (Nvidia GeForce 8800GT) 1.8GHz 14 7,168 (VRAM) 336GFlops 512MByte 57.6GByte/sec PC PCI-express 2.0 PC 8GByte/sec GPU GPU GPGPU[9] 2 GPU (2006 ) 3 (OpenGL Direct3D) NVidia C GPU (CUDA : Compute Unified Device Architecture) [10] GPU GPU GPU ( 2) CPU (main) bus bridge PCI e (8GB/sec) VRAM ( ) GPU RAM ( ) 1.Copy input data from RAM to GPU 2.Copy sub input data from VRAM to each GPU core 3. Execution, copy sub result to VRAM 4. Copy result to RAM 2: GPU 1. GPU 2. GPU GPU (CUDA ) 2 [11][12] 48

5 3. GPU GPU 4. GPU GPU GPU PC 1 GPU 8.0GByte/sec PC CPU 10.6GByte/sec GPU CPU 14 7,000 GPU 300GFlops PC CPU 7 3 GPU CPU 4 GPU ( 3(a)) ( 3(b)) 2 3. Harris Corner Detector CUDA Microsoft Windows ( 4.5 )GPU CUDA NVidia GeForce 8800GT I in (x, y, c) I out (x, y, c) GPU 1TFlops CUDA

6 (b) From RAM Input data : I I(1) I(2) I(3) I(4) I(5) (a) From RAM Processing on GPU proc proc proc proc proc Input data : I I(1) I(2) I(3) I(4) I(5) Sub Result : S S(1) S(2) S(3) S(4) S(5) Processing on GPU proc proc proc proc proc Merging exclusive operation exclusive operation Result data : R R(1) R(2) R(3) R(4) R(5) To RAM Result Dat a : R R(1) R(2) R(3) To RAM 3: RGB YUV HSI HSV rgb GPU RGB Sobel Prewitt LoG Sobel Prewitt I C Filter(I,C) = I C = IFFT(FFT(I) FFT(C)) CUDA FFT (CUDAFFT) Harris Corner Detector

7 ( 4) Grayscale Input: I I(1) I(2) I(3) I(4) I(5) proc proc proc proc proc Vot ing exclusive operation AtomicAdd() Operation H(1) H(2) H(3) Result Dat a : R R(1) R(2) R(3) 4: CUDA ( ) AtomicAdd H1 H2 HIN(H1,H2) D1 D2 CORR(D1,D2) HIN(H1,H2) = i max i=1 min(h1(i),h2(i)) CORR(D1,D2) i = (D1 i D1)(D2 i D2) i (D1 i D1) 2 i (D2 i D2) 2 51

8 Harris Corner Detector Harris Corder Detector Harris Corner Detector ( 5) Input image Red points : Corner 1. C I 5: Harris Corner Detector 2. (I xx =( I x )2 ) (I yy =( I y )2 ) (I xy = I x y ) 3. Gaussian (A = G I xx ), (B = G I yy ), (C = G I xy ) ( ) Ai C 4. H i = i λ Ci Bi 1,λ 2 5. M i = λ 1 λ 2 α (λ 1 + λ 2 ) GPU I 4.4 GPU GPU 52

9 GPU GPU GPU CPU GPU1 ( 6) CPU CPU (Multi core) OpenMP SMP bus bridge VRAM ( ) GPU 1 : CPU core 1 GPU 2 : CPU core2 RAM ( ) VRAM ( ) 6: GPU 2 GPU GPU CPU C GPU CUDA Microsoft Windows DLL(Dynamic Link Library) DLL C Matlab 6 5 GPU 4 ( , , , ) RGB HSV Sobel 7 7 Gaussian Matlab DLL (loadlibrary ) 53

10 2 2 4 ( , , , ) 3 (Matlab (CPU ) GPU GPU2 (CUDA ))) GPU Matlab DLL GPU2 CPU GPU GPU ( 2 5 ) Matlab CPU Intel Core2 Quad Q6600 ( 40GFlops) 4GByte CPU 10.6GByte/sec GPU 8GByte/sec(PCI-Express) CPU 2 5 2: ( : ) :msec CPU GPU GPU (9.302) (0.651) (3.140) : ( : ) :msec CPU GPU GPU (45.70) (0.716) (8.547)

11 4: ( : ) :msec CPU GPU GPU (168.8) (0.7697) (28.65) : ( : ) :msec CPU GPU GPU (568.6) (0.7170) (115.4) GPU GPU CPU 2 1. GPU CPU GPU 2. GPU GPU CPU CPU 16 GPU1 GPU GPU ( ) GPU 55

12 6.1 GPU GPU CPU GPU GPU GPU CPU GPU GPU CPU GPU GPU 7 GPU GPU DLL Harris Corner Detector GPU GPU PC [1] 35 5 pp [2] 6 H pp [3] pp [4],,, 10 pp [5] A combined corner and edge detector C. Harris and M. Stephens Proceedings of the 4th Alvey Vision Conference pp [6] Intel Streaming SIMD Extensions 4 (SSE4) Instruction Set Intel Corp. architecture-silicon/sse4-instructions/ 2007 [7] The OpenMP specification for parallel programming OpenMP Architecture Review Board [8] Message Passing Interface Forum MPI Forum 56

13 [9] General-Purpose Computation Using Graphics Hardware [10] NVIDIA CUDA Zone NVIDIA Corp. home.html 2007 [11] GPU-based implementation of the KLT Tracker, ssinha/research/gpu KLT/ [12] GPU-based implementation of Scale Invariant Feature Transform, ccwu/siftgpu/ 57

1 GPU GPGPU GPU CPU 2 GPU 2007 NVIDIA GPGPU CUDA[3] GPGPU CUDA GPGPU CUDA GPGPU GPU GPU GPU Graphics Processing Unit LSI LSI CPU ( ) DRAM GPU LSI GPU

1 GPU GPGPU GPU CPU 2 GPU 2007 NVIDIA GPGPU CUDA[3] GPGPU CUDA GPGPU CUDA GPGPU GPU GPU GPU Graphics Processing Unit LSI LSI CPU ( ) DRAM GPU LSI GPU GPGPU (I) GPU GPGPU 1 GPU(Graphics Processing Unit) GPU GPGPU(General-Purpose computing on GPUs) GPU GPGPU GPU ( PC ) PC PC GPU PC PC GPU GPU 2008 TSUBAME NVIDIA GPU(Tesla S1070) TOP500 29 [1] 2009 AMD