GPGPU 2013 1008 2015 1 23
Abstract In recent years, with the advance of microscope technology, the alive cells have been able to observe. On the other hand, from the standpoint of image processing, the quality assessment which determine whether cells were successfully cultured has been studied. Therefore, the cell image processing is performed not only static images but also moving images. However, because of moving images are aggregation of static images, the processing of these images would take large burdens. Thus, we focused on massive cores, which Graphics Processing Units (GPU) contains, to solve the problem above. In this study, to aim high quality cell segmentation system, we build the spatiotemporal image processing system using General-Purpose computing on Graphics Processing Units (GPGPU) as parallel processing. To evaluate the performance of our proposal system, we compared the processing speed of the spatio-temporal image processing with two cases: one is only using CPU and another is GPGPU. As the result, processing speed of GPGPU is faster than CPU. Moreover, the difference between these processing speeds was extend as increasing the amount of data. Based on the above, the usefulness of GPGPU use in cell segmentation system with moving images had been suggested.
1 1 2 GPGPU 2 2.1 CPU..................... 2 2.2 GPU..................... 2 2.3 GPGPU GPU...................... 3 2.4 CUDA............................. 4 3 GPGPU 5 3.1............................ 5 3.2................................ 6 4 7 4.1..................................... 7 4.2..................................... 7 5 9 6 10
1,,?.,, 1.,, 3?. (1) X,. (2),. (3)., X??,.,,,,?.,,,.,,., 30 [ms]?, 1, 33.,, Graphics Processing Unit (GPU). GPU,,, Central Processing Unit (CPU)?. GPU, GPU (General-purpose computing on graphics processing units: GPGPU)., CPU GPU?,., GPGPU.., 2, GPU,.,??,,,. 3,,., 4, (1)CPU (2)GPGPU, GPGPU. 5 CPU, GPGPU, 6 GPGPU. 1 http://www.nedo.go.jp/activities/zz 00184.html 1
2 GPGPU, GPU, GPGPU. 2.1 CPU CPU,,,., CPU. 18, CPU 2.1.. p = 2 n 1.5 (2.1), 2010,.,, CPU. 2, CPU, CPU., CPU. CPU, CPU., 1 1,. CPU,.,,. 2.2 GPU 2.1, CPU,.,, CPU,,.,, GPU. GPU,., 1, 1,,,. GPU,. GPGPU,, GPU.,, NVIDIA GPU C 2
Compute Unified Device Architecture (CUDA). CUDA, CUDA,, Fermi. 2.3 GPGPU GPU 2.3.1 Fermi Fermi, NVIDIA GPGPU. GPU,. Fermi, Streaming Processor (SP) (CUDA ), SP 32 Streaming Multi-processor (SM). Fermi SM 16, SP 512 (32x 16)., Fermi.,., GPU, 1., 1,.,,., Fermi, L1, L2,, GPU. Fermi GPU,,., Kepler. 2.3.2 Kepler Kepler, Fermi. Kepler 1W, Fermi 3.,. Fermi Fig. 1. Kepler Fermi 2,,., Fermi GPU, Kepler CPU, GPU., SP Fermi 32, 192 SMX. Kepler GPU GTX680, SMX 8, SP 1,536 (192x 8). 3
2.4 CUDA 2.3, GPGPU GPU,, CPU., GPGPU CPU. CPU GPGPU, CUDA(Fig. 2(a) ). CUDA, CPU, GPGPU,, CPU GPGPU.. CPU CUDA Application Programming Interface(, API), CPU GPGPU., GPGPU CUDA C Language, GPGPU. GPU., CPU GPGPU, CUDA. CUDA, (Fig. 2(b) ). 1 GPU. 2 CPU GPU 3. 4. GPU. 5 GPU CPU. 6 GPU. GPGPU Fig. 2(c) Grid, Block, Thread 3,. Block, CPU GPGPU., GPU, GPGPU., GPGPU,. 4
3 GPGPU, 2. 1. 2 GPGPU. 1 Fig. 3. Fig. 3,,.,. Fig. 4.,,,.,,. Fig. 5., GPGPU, (1). 3.1 3.1.1, ( ).,.,,., (Fig. 3),,. Fig. 3,, ( ), ( ) Fig. 6,.,.,,,.., Fig. 4. 1. 2,. 3, 1 (2). 4 (2) (3). 5
5,.,.,,. 3.1.2, 3. 3, x, y,.,.,,,. Fig. 5.,.,. 3.2 GPU GPGPU, NVIDIA 2 GPU C Compute Unified Device Architecture(CUDA). CUDA, CPU GPU. CUDA. 1 GPU. 2 CPU GPU 3. 4. GPU. 5 GPU CPU. 6 GPU. 2 Parallel Programming and Computing Platform NVIDIA http://www.nvidia.com/object/cuda home new.html 6
4 4.1 GPU Table 1, GPGPU Table 2. GTX680 2. 4.2 4.2.1 (Fig. 7). 1 2 (2 ) 3 Non Local(NL) Means 4 5 (20 ) 6, (1).,,. (2) 3.1.2 2,. (3)NLMeans. NLMeans,,.,, (, L2 ).,,.,,, NLMeans (1 x 1),.. ω(p, q) = Z(p, q) = q S 1 Z(p) exp( max( v(p) v(q) 2 2 2σ2, 0) h 2 ) (4.1) exp( max( v(p) v(q) 2 2 2σ2, 0) h 2 ) (4.2) out(p) = q S ω(p, q)in(q) (4.3) 2 GTX 680 Kepler Whitepaper - GeForce http://www.geforce.com/active/en US/en US/pdf/ GeForce-GTX-680-Whitepaper-FINAL.pdf 7
p S q w(p, q), p v(p) v(q) L2,, 0. h. out(p) in. (4), 3.1.1,. (5),,.,., 20. (6).,.,,., σw 2 σ2 B., σw 2 /σ2 B,., Fig. 8. 4.2.2 4.2.1, (2) (2 ), (3) NLMeans, (4), (5) (20 ) Fig. 9. (2) (2 ) CPU 3.33 0.00, GPU 2.54 0.07 [s]. (3) NLMeans CPU 89.7 0.06 [s], GPGPU 8.77 0.07 [s]. (4) CPU 121 0.00 [s], GPGPU 2345 0.17 [s]. (5) (20 ) CPU 3.09 0.00, GPU 2.63 0.07 [s]. (2), (3), (5) GPGPU., (4), GPGPU. 4.2.3 1 Fig. 10. Fig. 7 GPGPU, CPU 2.12 0.13 [s], GPU 25.3 0.35 [s]. CPU., GPGPU,., CPU., CPU, GPGPU (GPGPU ), CPU. 8
5 GPU, CPU GPU, 4pixel, Fig. 11., CPU., GPU, GPU,,,., CPU GPGPU, GPGPU,.,,.,.,,,. 9
6,,.,,.,,,.,,.,, GPU. GPU, CPU, GPU., (1), (2) (2 ), (3) NLMeans, (4), (5) (20 ), (6).,, GPGPU., GPGPU, CPU., CPU GPGPU, GPGPU,. GPGPU,,. 10
3,,.,,,,,,.,,,,,.,.,..,.,,,.,,.,,,,.,,.,,,.,,.,.,., 3.,,... 11
1 Comparison between Fermi and Kepler architecture............ 1 2 About CUDA application........................... 2 3 Original cell image............................... 3 4 Local histogram equalization......................... 3 5 Working flow of spatio-temporal image processing.............. 3 6 Overall histogram equalization........................ 3 7 Sequence of image processing......................... 4 8 Result image.................................. 4 9 Processing Speed of each filter process.................... 4 10 Processing Speed................................ 5 11 Processing sequence of local histogram equalization on GPU........ 5 1 Specification of NVIDIA GRID K2...................... 1 2 Specification of the machine.......................... 1
Fig. 1 Comparison between Fermi and Kepler architecture Table 1 Specification of NVIDIA GRID K2 GPU GTX680 x2 processor cores 1,536 clock rate [MHz] 745 global memory [Kbyte] 4 memory clock rate [KHz] 2.5 shared memory [Kbyte/block] 48 L1 cashe [Kbyte] 64 L2 cashe [Kbyte] 512 Table 2 Specification of the machine OS Microsoft Windows 7 Enterprise OS version 6.1.7601 Service Pack 1 memory [GB] 16.38 processor Intel c Xeon R CPU E5-2680 v2 clock rate [GHz] 2.8 1
(a) CUDA Application (b) CUDA Programming Work flow (c) GPU Architecture Fig. 2 About CUDA application 2
Fig. 3 Original cell image Fig. 4 Local histogram equalization Fig. 5 Working flow of spatio-temporal image processing Fig. 6 Overall histogram equalization 3
Fig. 7 Sequence of image processing Fig. 8 Result image (a) Processing Speed of each filter process (b) Core Occupation of GPU (c) Core Occupation of GPU (d) Core Occupation of GPU Fig. 9 Processing Speed of each filter process 4
Fig. 10 Processing Speed Fig. 11 Processing sequence of local histogram equalization on GPU 5