GPGPU

Similar documents
23 Fig. 2: hwmodulev2 3. Reconfigurable HPC 3.1 hw/sw hw/sw hw/sw FPGA PC FPGA PC FPGA HPC FPGA FPGA hw/sw hw/sw hw- Module FPGA hwmodule hw/sw FPGA h

4.1 % 7.5 %

07-二村幸孝・出口大輔.indd

1 OpenCL OpenCL 1 OpenCL GPU ( ) 1 OpenCL Compute Units Elements OpenCL OpenCL SPMD (Single-Program, Multiple-Data) SPMD OpenCL work-item work-group N

Microsoft PowerPoint - GPU_computing_2013_01.pptx

1 Table 1: Identification by color of voxel Voxel Mode of expression Nothing Other 1 Orange 2 Blue 3 Yellow 4 SSL Humanoid SSL-Vision 3 3 [, 21] 8 325

GPU GPU CPU CPU CPU GPU GPU N N CPU ( ) 1 GPU CPU GPU 2D 3D CPU GPU GPU GPGPU GPGPU 2 nvidia GPU CUDA 3 GPU 3.1 GPU Core 1

IPSJ SIG Technical Report Vol.2013-ARC-203 No /2/1 SMYLE OpenCL (NEDO) IT FPGA SMYLEref SMYLE OpenCL SMYLE OpenCL FPGA 1

main.dvi

[2] OCR [3], [4] [5] [6] [4], [7] [8], [9] 1 [10] Fig. 1 Current arrangement and size of ruby. 2 Fig. 2 Typography combined with printing

16.16%

16_.....E...._.I.v2006

Web Web Web Web Web, i

,,,,., C Java,,.,,.,., ,,.,, i

MATLAB® における並列・分散コンピューティング ~ Parallel Computing Toolbox™ & MATLAB Distributed Computing Server™ ~

! 行行 CPUDSP PPESPECell/B.E. CPUGPU 行行 SIMD [SSE, AltiVec] 用 HPC CPUDSP PPESPE (Cell/B.E.) SPE CPUGPU GPU CPU DSP DSP PPE SPE SPE CPU DSP SPE 2

, (GPS: Global Positioning Systemg),.,, (LBS: Local Based Services).. GPS,.,. RFID LAN,.,.,.,,,.,..,.,.,,, i

10D16.dvi

Vol. 48 No. 4 Apr LAN TCP/IP LAN TCP/IP 1 PC TCP/IP 1 PC User-mode Linux 12 Development of a System to Visualize Computer Network Behavior for L

3_23.dvi

LAN LAN LAN LAN LAN LAN,, i

Sobel Canny i

1 3DCG [2] 3DCG CG 3DCG [3] 3DCG 3 3 API 2 3DCG 3 (1) Saito [4] (a) 1920x1080 (b) 1280x720 (c) 640x360 (d) 320x G-Buffer Decaudin[5] G-Buffer D


indd

Slides: TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments

屋内ロケーション管理技術

揃 Lag [hour] Lag [day] 35

Input image Initialize variables Loop for period of oscillation Update height map Make shade image Change property of image Output image Change time L

28 Docker Design and Implementation of Program Evaluation System Using Docker Virtualized Environment

Journal of Geography 116 (6) Configuration of Rapid Digital Mapping System Using Tablet PC and its Application to Obtaining Ground Truth

2017 (413812)

P2P P2P peer peer P2P peer P2P peer P2P i

258 5) GPS 1 GPS 6) GPS DP 7) 8) 10) GPS GPS ) GPS Global Positioning System

17 Proposal of an Algorithm of Image Extraction and Research on Improvement of a Man-machine Interface of Food Intake Measuring System

kiyo5_1-masuzawa.indd


ActionScript Flash Player 8 ActionScript3.0 ActionScript Flash Video ActionScript.swf swf FlashPlayer AVM(Actionscript Virtual Machine) Windows

卒業論文2.dvi



2013 Future University Hakodate 2013 System Information Science Practice Group Report biblive : Project Name biblive : Recording and sharing experienc

情報処理学会研究報告 IPSJ SIG Technical Report Vol.2013-CVIM-186 No /3/15 EMD 1,a) SIFT. SIFT Bag-of-keypoints. SIFT SIFT.. Earth Mover s Distance

27 VR Effects of the position of viewpoint on self body in VR environment

Fig. 3 Flow diagram of image processing. Black rectangle in the photo indicates the processing area (128 x 32 pixels).

1 GPU GPGPU GPU CPU 2 GPU 2007 NVIDIA GPGPU CUDA[3] GPGPU CUDA GPGPU CUDA GPGPU GPU GPU GPU Graphics Processing Unit LSI LSI CPU ( ) DRAM GPU LSI GPU

A Study on Practical Use of Artificial Intelligence. The purpose of this research paper is to demonstrate the ease of using artificial intelligence in

第62巻 第1号 平成24年4月/石こうを用いた木材ペレット

Virtual Window System Virtual Window System Virtual Window System Virtual Window System Virtual Window System Virtual Window System Social Networking

2 The Bulletin of Meiji University of Integrative Medicine 3, Yamashita 10 11


マルチコアPCクラスタ環境におけるBDD法のハイブリッド並列実装

Core Ethics Vol. a

1 Web [2] Web [3] [4] [5], [6] [7] [8] S.W. [9] 3. MeetingShelf Web MeetingShelf MeetingShelf (1) (2) (3) (4) (5) Web MeetingShelf

IPSJ SIG Technical Report Vol.2014-CG-155 No /6/28 1,a) 1,2,3 1 3,4 CG An Interpolation Method of Different Flow Fields using Polar Inter

58 10


06_学術.indd

2016 [1][2] H.264/AVC HEVC HEVC

WebGL OpenGL GLSL Kageyama (Kobe Univ.) Visualization / 57

第5部門_05_垣本 徹.indd

1..FEM FEM 3. 4.

はじめに

06_学術_関節単純X線画像における_1c_梅木様.indd

(1 ) (2 ) Table 1. Details of each bar group sheared simultaneously (major shearing unit). 208

TOOLS for UR44 Release Notes for Windows

A Nutritional Study of Anemia in Pregnancy Hematologic Characteristics in Pregnancy (Part 1) Keizo Shiraki, Fumiko Hisaoka Department of Nutrition, Sc

塗装深み感の要因解析

FIT2013( 第 12 回情報科学技術フォーラム ) I-032 Acceleration of Adaptive Bilateral Filter base on Spatial Decomposition and Symmetry of Weights 1. Taiki Makishi Ch

(SAD) x86 MPSADBW H.264/AVC H.264/AVC SAD SAD x86 SAD MPSADBW SAD 3x3 3 9 SAD SAD SAD x86 MPSADBW SAD 9 SAD SAD 4.6

Fig. 1 A: Effects of intramuscular injection of glucagon on the blood glucose levels (changes from basal, ƒ BG) as compared with effects of scopolamin

ABSTRACT The movement to increase the adult literacy rate in Nepal has been growing since democratization in In recent years, about 300,000 peop

( CUDA CUDA CUDA CUDA ( NVIDIA CUDA I

浜松医科大学紀要

HPC pdf


在日外国人高齢者福祉給付金制度の創設とその課題

<95DB8C9288E397C389C88A E696E6462>

Visual Evaluation of Polka-dot Patterns Yoojin LEE and Nobuko NARUSE * Granduate School of Bunka Women's University, and * Faculty of Fashion Science,

05_藤田先生_責

6 2. AUTOSAR 2.1 AUTOSAR AUTOSAR ECU OSEK/VDX 3) OSEK/VDX OS AUTOSAR AUTOSAR ECU AUTOSAR 1 AUTOSAR BSW (Basic Software) (Runtime Environment) Applicat

SOM SOM(Self-Organizing Maps) SOM SOM SOM SOM SOM SOM i


Studies of Foot Form for Footwear Design (Part 9) : Characteristics of the Foot Form of Young and Elder Women Based on their Sizes of Ball Joint Girth

28 TCG SURF Card recognition using SURF in TCG play video

23 A Comparison of Flick and Ring Document Scrolling in Touch-based Mobile Phones

Table 1. Reluctance equalization design. Fig. 2. Voltage vector of LSynRM. Fig. 4. Analytical model. Table 2. Specifications of analytical models. Fig

Core Ethics Vol.

FabHetero FabHetero FabHetero FabCache FabCache SPEC2000INT IPC FabCache 0.076%

IPSJ SIG Technical Report Vol.2013-GN-86 No.35 Vol.2013-CDS-6 No /1/17 1,a) 2,b) (1) (2) (3) Development of Mobile Multilingual Medical

Deep Learning Deep Learning GPU GPU FPGA %

【生】④木原資裕先生【本文】/【生】④木原資裕先生【本文】

Web Web Web Web i

パナソニック技報

Abstract The purpose of this study is to reveal an effective video effects in Projection Mapping event. So, I made a Projection Mapping event in Old P

21 e-learning Development of Real-time Learner Detection System for e-learning

untitled

220 28;29) 30 35) 26;27) % 8.0% 9 36) 8) 14) 37) O O 13 2 E S % % 2 6 1fl 2fl 3fl 3 4



Transcription:

GPGPU 2013 1008 2015 1 23

Abstract In recent years, with the advance of microscope technology, the alive cells have been able to observe. On the other hand, from the standpoint of image processing, the quality assessment which determine whether cells were successfully cultured has been studied. Therefore, the cell image processing is performed not only static images but also moving images. However, because of moving images are aggregation of static images, the processing of these images would take large burdens. Thus, we focused on massive cores, which Graphics Processing Units (GPU) contains, to solve the problem above. In this study, to aim high quality cell segmentation system, we build the spatiotemporal image processing system using General-Purpose computing on Graphics Processing Units (GPGPU) as parallel processing. To evaluate the performance of our proposal system, we compared the processing speed of the spatio-temporal image processing with two cases: one is only using CPU and another is GPGPU. As the result, processing speed of GPGPU is faster than CPU. Moreover, the difference between these processing speeds was extend as increasing the amount of data. Based on the above, the usefulness of GPGPU use in cell segmentation system with moving images had been suggested.

1 1 2 GPGPU 2 2.1 CPU..................... 2 2.2 GPU..................... 2 2.3 GPGPU GPU...................... 3 2.4 CUDA............................. 4 3 GPGPU 5 3.1............................ 5 3.2................................ 6 4 7 4.1..................................... 7 4.2..................................... 7 5 9 6 10

1,,?.,, 1.,, 3?. (1) X,. (2),. (3)., X??,.,,,,?.,,,.,,., 30 [ms]?, 1, 33.,, Graphics Processing Unit (GPU). GPU,,, Central Processing Unit (CPU)?. GPU, GPU (General-purpose computing on graphics processing units: GPGPU)., CPU GPU?,., GPGPU.., 2, GPU,.,??,,,. 3,,., 4, (1)CPU (2)GPGPU, GPGPU. 5 CPU, GPGPU, 6 GPGPU. 1 http://www.nedo.go.jp/activities/zz 00184.html 1

2 GPGPU, GPU, GPGPU. 2.1 CPU CPU,,,., CPU. 18, CPU 2.1.. p = 2 n 1.5 (2.1), 2010,.,, CPU. 2, CPU, CPU., CPU. CPU, CPU., 1 1,. CPU,.,,. 2.2 GPU 2.1, CPU,.,, CPU,,.,, GPU. GPU,., 1, 1,,,. GPU,. GPGPU,, GPU.,, NVIDIA GPU C 2

Compute Unified Device Architecture (CUDA). CUDA, CUDA,, Fermi. 2.3 GPGPU GPU 2.3.1 Fermi Fermi, NVIDIA GPGPU. GPU,. Fermi, Streaming Processor (SP) (CUDA ), SP 32 Streaming Multi-processor (SM). Fermi SM 16, SP 512 (32x 16)., Fermi.,., GPU, 1., 1,.,,., Fermi, L1, L2,, GPU. Fermi GPU,,., Kepler. 2.3.2 Kepler Kepler, Fermi. Kepler 1W, Fermi 3.,. Fermi Fig. 1. Kepler Fermi 2,,., Fermi GPU, Kepler CPU, GPU., SP Fermi 32, 192 SMX. Kepler GPU GTX680, SMX 8, SP 1,536 (192x 8). 3

2.4 CUDA 2.3, GPGPU GPU,, CPU., GPGPU CPU. CPU GPGPU, CUDA(Fig. 2(a) ). CUDA, CPU, GPGPU,, CPU GPGPU.. CPU CUDA Application Programming Interface(, API), CPU GPGPU., GPGPU CUDA C Language, GPGPU. GPU., CPU GPGPU, CUDA. CUDA, (Fig. 2(b) ). 1 GPU. 2 CPU GPU 3. 4. GPU. 5 GPU CPU. 6 GPU. GPGPU Fig. 2(c) Grid, Block, Thread 3,. Block, CPU GPGPU., GPU, GPGPU., GPGPU,. 4

3 GPGPU, 2. 1. 2 GPGPU. 1 Fig. 3. Fig. 3,,.,. Fig. 4.,,,.,,. Fig. 5., GPGPU, (1). 3.1 3.1.1, ( ).,.,,., (Fig. 3),,. Fig. 3,, ( ), ( ) Fig. 6,.,.,,,.., Fig. 4. 1. 2,. 3, 1 (2). 4 (2) (3). 5

5,.,.,,. 3.1.2, 3. 3, x, y,.,.,,,. Fig. 5.,.,. 3.2 GPU GPGPU, NVIDIA 2 GPU C Compute Unified Device Architecture(CUDA). CUDA, CPU GPU. CUDA. 1 GPU. 2 CPU GPU 3. 4. GPU. 5 GPU CPU. 6 GPU. 2 Parallel Programming and Computing Platform NVIDIA http://www.nvidia.com/object/cuda home new.html 6

4 4.1 GPU Table 1, GPGPU Table 2. GTX680 2. 4.2 4.2.1 (Fig. 7). 1 2 (2 ) 3 Non Local(NL) Means 4 5 (20 ) 6, (1).,,. (2) 3.1.2 2,. (3)NLMeans. NLMeans,,.,, (, L2 ).,,.,,, NLMeans (1 x 1),.. ω(p, q) = Z(p, q) = q S 1 Z(p) exp( max( v(p) v(q) 2 2 2σ2, 0) h 2 ) (4.1) exp( max( v(p) v(q) 2 2 2σ2, 0) h 2 ) (4.2) out(p) = q S ω(p, q)in(q) (4.3) 2 GTX 680 Kepler Whitepaper - GeForce http://www.geforce.com/active/en US/en US/pdf/ GeForce-GTX-680-Whitepaper-FINAL.pdf 7

p S q w(p, q), p v(p) v(q) L2,, 0. h. out(p) in. (4), 3.1.1,. (5),,.,., 20. (6).,.,,., σw 2 σ2 B., σw 2 /σ2 B,., Fig. 8. 4.2.2 4.2.1, (2) (2 ), (3) NLMeans, (4), (5) (20 ) Fig. 9. (2) (2 ) CPU 3.33 0.00, GPU 2.54 0.07 [s]. (3) NLMeans CPU 89.7 0.06 [s], GPGPU 8.77 0.07 [s]. (4) CPU 121 0.00 [s], GPGPU 2345 0.17 [s]. (5) (20 ) CPU 3.09 0.00, GPU 2.63 0.07 [s]. (2), (3), (5) GPGPU., (4), GPGPU. 4.2.3 1 Fig. 10. Fig. 7 GPGPU, CPU 2.12 0.13 [s], GPU 25.3 0.35 [s]. CPU., GPGPU,., CPU., CPU, GPGPU (GPGPU ), CPU. 8

5 GPU, CPU GPU, 4pixel, Fig. 11., CPU., GPU, GPU,,,., CPU GPGPU, GPGPU,.,,.,.,,,. 9

6,,.,,.,,,.,,.,, GPU. GPU, CPU, GPU., (1), (2) (2 ), (3) NLMeans, (4), (5) (20 ), (6).,, GPGPU., GPGPU, CPU., CPU GPGPU, GPGPU,. GPGPU,,. 10

3,,.,,,,,,.,,,,,.,.,..,.,,,.,,.,,,,.,,.,,,.,,.,.,., 3.,,... 11

1 Comparison between Fermi and Kepler architecture............ 1 2 About CUDA application........................... 2 3 Original cell image............................... 3 4 Local histogram equalization......................... 3 5 Working flow of spatio-temporal image processing.............. 3 6 Overall histogram equalization........................ 3 7 Sequence of image processing......................... 4 8 Result image.................................. 4 9 Processing Speed of each filter process.................... 4 10 Processing Speed................................ 5 11 Processing sequence of local histogram equalization on GPU........ 5 1 Specification of NVIDIA GRID K2...................... 1 2 Specification of the machine.......................... 1

Fig. 1 Comparison between Fermi and Kepler architecture Table 1 Specification of NVIDIA GRID K2 GPU GTX680 x2 processor cores 1,536 clock rate [MHz] 745 global memory [Kbyte] 4 memory clock rate [KHz] 2.5 shared memory [Kbyte/block] 48 L1 cashe [Kbyte] 64 L2 cashe [Kbyte] 512 Table 2 Specification of the machine OS Microsoft Windows 7 Enterprise OS version 6.1.7601 Service Pack 1 memory [GB] 16.38 processor Intel c Xeon R CPU E5-2680 v2 clock rate [GHz] 2.8 1

(a) CUDA Application (b) CUDA Programming Work flow (c) GPU Architecture Fig. 2 About CUDA application 2

Fig. 3 Original cell image Fig. 4 Local histogram equalization Fig. 5 Working flow of spatio-temporal image processing Fig. 6 Overall histogram equalization 3

Fig. 7 Sequence of image processing Fig. 8 Result image (a) Processing Speed of each filter process (b) Core Occupation of GPU (c) Core Occupation of GPU (d) Core Occupation of GPU Fig. 9 Processing Speed of each filter process 4

Fig. 10 Processing Speed Fig. 11 Processing sequence of local histogram equalization on GPU 5