main.dvi

Similar documents
1 GPU GPGPU GPU CPU 2 GPU 2007 NVIDIA GPGPU CUDA[3] GPGPU CUDA GPGPU CUDA GPGPU GPU GPU GPU Graphics Processing Unit LSI LSI CPU ( ) DRAM GPU LSI GPU


07-二村幸孝・出口大輔.indd

スパコンに通じる並列プログラミングの基礎

GPGPU

スパコンに通じる並列プログラミングの基礎

スパコンに通じる並列プログラミングの基礎

スライド 1

IPSJ SIG Technical Report Vol.2013-ARC-203 No /2/1 SMYLE OpenCL (NEDO) IT FPGA SMYLEref SMYLE OpenCL SMYLE OpenCL FPGA 1

untitled

23 Fig. 2: hwmodulev2 3. Reconfigurable HPC 3.1 hw/sw hw/sw hw/sw FPGA PC FPGA PC FPGA HPC FPGA FPGA hw/sw hw/sw hw- Module FPGA hwmodule hw/sw FPGA h

IPSJ SIG Technical Report iphone iphone,,., OpenGl ES 2.0 GLSL(OpenGL Shading Language), iphone GPGPU(General-Purpose Computing on Graphics Proc

HPC pdf

untitled

卒業論文

CPU Levels in the memory hierarchy Level 1 Level 2... Increasing distance from the CPU in access time Level n Size of the memory at each level 1: 2.2

supercomputer2010.ppt

RaVioli SIMD

GPUコンピューティング講習会パート1

rank ”«‘‚“™z‡Ì GPU ‡É‡æ‡éŁÀŠñ›»

211 年ハイパフォーマンスコンピューティングと計算科学シンポジウム Computing Symposium 211 HPCS /1/18 a a 1 a 2 a 3 a a GPU Graphics Processing Unit GPU CPU GPU GPGPU G

01_OpenMP_osx.indd

HPEハイパフォーマンスコンピューティング ソリューション

Slides: TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments

PC Development of Distributed PC Grid System,,,, Junji Umemoto, Hiroyuki Ebara, Katsumi Onishi, Hiroaki Morikawa, and Bunryu U PC WAN PC PC WAN PC 1 P

1 OpenCL OpenCL 1 OpenCL GPU ( ) 1 OpenCL Compute Units Elements OpenCL OpenCL SPMD (Single-Program, Multiple-Data) SPMD OpenCL work-item work-group N

! 行行 CPUDSP PPESPECell/B.E. CPUGPU 行行 SIMD [SSE, AltiVec] 用 HPC CPUDSP PPESPE (Cell/B.E.) SPE CPUGPU GPU CPU DSP DSP PPE SPE SPE CPU DSP SPE 2

Microsoft PowerPoint - GPU_computing_2013_01.pptx

GPU GPU CPU CPU CPU GPU GPU N N CPU ( ) 1 GPU CPU GPU 2D 3D CPU GPU GPU GPGPU GPGPU 2 nvidia GPU CUDA 3 GPU 3.1 GPU Core 1

OpenGL GLSL References Kageyama (Kobe Univ.) Visualization / 58

FIT2013( 第 12 回情報科学技術フォーラム ) I-032 Acceleration of Adaptive Bilateral Filter base on Spatial Decomposition and Symmetry of Weights 1. Taiki Makishi Ch

IPSJ SIG Technical Report Vol.2010-MPS-77 No /3/5 VR SIFT Virtual View Generation in Hallway of Cybercity Buildings from Video Sequen


untitled

IPSJ SIG Technical Report Vol.2012-ARC-202 No.13 Vol.2012-HPC-137 No /12/13 Tightly Coupled Accelerators 1,a) 1,b) 1,c) 1,d) GPU HA-PACS

SICE東北支部研究集会資料(2013年)

( CUDA CUDA CUDA CUDA ( NVIDIA CUDA I

06.indd


JIIAセミナー

1 Table 1: Identification by color of voxel Voxel Mode of expression Nothing Other 1 Orange 2 Blue 3 Yellow 4 SSL Humanoid SSL-Vision 3 3 [, 21] 8 325

熊本大学学術リポジトリ Kumamoto University Repositor Title GPGPU による高速演算について Author(s) 榎本, 昌一 Citation Issue date Type URL Presentation


_CS6.indd

GPUコンピューティング講習会パート1

GPUを用いたN体計算

CANON_IT_catalog_1612

インテル(R) Visual Fortran Composer XE 2013 Windows版 入門ガイド

matrox0

Express5800/120Ed

WebGL OpenGL GLSL Kageyama (Kobe Univ.) Visualization / 57

GPU n Graphics Processing Unit CG CAD

10D16.dvi

Express5800/110Ee (2002/01/22)

Express5800/110Ee Pentium 1. Express5800/110Ee N N Express5800/110Ee Express5800/110Ee ( /800EB(256)) ( /800EB(256) 20W) CPU L1 L2 CD-

EGunGPU

マルチコアPCクラスタ環境におけるBDD法のハイブリッド並列実装

2ndD3.eps

スライド 1

02_Matrox Frame Grabbers_1612


Łñ“’‘‚2004

プリント

Express5800/120Le

HP High Performance Computing(HPC)


2.2 6).,.,.,. Yang, 7).,,.,,. 2.3 SIFT SIFT (Scale-Invariant Feature Transform) 8).,. SIFT,,. SIFT, Mean-Shift 9)., SIFT,., SIFT,. 3.,.,,,,,.,,,., 1,

untitled

AMD/ATI Radeon HD 5870 GPU DEGIMA LINPACK HD 5870 GPU DEGIMA LINPACK GFlops/Watt GFlops/Watt Abstract GPU Computing has lately attracted


Fuzzy Multiple Discrimminant Analysis (FMDA) 5) (SOM) 6) SOM 3 6) SOM SOM SOM SOM SOM SOM 7) 8) SOM SOM SOM GPU 2. n k f(x) m g(x) (1) 12) { min(max)


FFTSS Library Version 3.0 User's Guide

Express5800/120Lf 1. Express5800/120Lf N N N Express5800/120Lf Express5800/120Lf Express5800/120Lf ( /1BG(256)) ( /1BG(256)) (

GPU.....

MATLAB® における並列・分散コンピューティング ~ Parallel Computing Toolbox™ & MATLAB Distributed Computing Server™ ~

(SAD) x86 MPSADBW H.264/AVC H.264/AVC SAD SAD x86 SAD MPSADBW SAD 3x3 3 9 SAD SAD SAD x86 MPSADBW SAD 9 SAD SAD 4.6

Express5800/120Rb-1 (2002/01/22)

Express5800/120Rc-2 Workgroup/Department 1. Express5800/120Rc-2 N N N Express5800/120Rc-2 Express5800/120Rc-2 Express5800/120R

2017 (413812)

Express5800/120Ra-1

HBase Phoenix API Mars GPU MapReduce GPU Hadoop Hadoop Hadoop MapReduce : (1) MapReduce (2)JobTracker 1 Hadoop CPU GPU Fig. 1 The overview of CPU-GPU

橡3_2石川.PDF

インテル(R) Visual Fortran Composer XE

2012 M

Express5800/110Rc-1 1. Express5800/110Rc-1 N N Express5800/110Rc-1 Express5800/110Rc-1 ( /1BG(256)) (C/850(128)) CPU Pentium (1BGHz) 1

倍々精度RgemmのnVidia C2050上への実装と応用

パナソニック技報

An Interactive Visualization System of Human Network for Multi-User Hiroki Akehata 11N F

H1-4

untitled

64bit SSE2 SSE2 FPU Visual C++ 64bit Inline Assembler 4 FPU SSE2 4.1 FPU Control Word FPU 16bit R R R IC RC(2) PC(2) R R PM UM OM ZM DM IM R: reserved

IPSJ SIG Technical Report Vol.2013-ARC-206 No /8/1 Android Dominic Hillenbrand ODROID-X2 GPIO Android OSCAR WFI 500[us] GPIO GP

Express5800/120Rb-2

258 5) GPS 1 GPS 6) GPS DP 7) 8) 10) GPS GPS ) GPS Global Positioning System

2009 3DCG : M DCG,,,, 3DCG 2D 3DCG 2D 3DCG 3DCG

3 SIMPLE ver 3.2: SIMPLE (SIxteen-bit MicroProcessor for Laboratory Experiment) 1 16 SIMPLE SIMPLE 2 SIMPLE 2.1 SIMPLE (main memo

ワークステーション推奨スペック Avid Avid Nitris Mojo SDI Fibre 及び Adrenaline MC ソフトウェア 3.5 以降のバージョンが必要です Dual 2.26 GHz Quad Core Intel 構成のに関しては Configuration Guideli

main

,4) 1 P% P%P=2.5 5%!%! (1) = (2) l l Figure 1 A compilation flow of the proposing sampling based architecture simulation

Transcription:

PC 1 1 [1][2] [3][4] ( ) GPU(Graphics Processing Unit) GPU PC GPU PC ( 2 GPU ) GPU Harris Corner Detector[5] CPU ( ) ( ) CPU GPU 2 3 GPU 4 5 6 7 1 toyohiro@isc.kyutech.ac.jp 45

2 ( ) CPU ( ) ( ) () 2.1 ( 1) 1. 2. ( ) 3. 2.2 () CPU SIMD PC (CPU) SIMD (Intel :SSE4 AMD :SSE5 ) [6] CPU 4 32bit 46

input () output 1: SMP(Symmetric Multi Processing) CPU (Intel :Core2 AMD :Phenom ) SMP SMP OpenMP[7] CPU ( ) SMP 16 (= ) PC MPI[8] (Ethernet ) 3 GPU(Graphics Processing Unit) GPU 3 PC. GPU NVIDIA (GeForce ) AMD(ATI) (Radeon ). GPU 47

PC 1 GPU 1: GPU (Nvidia GeForce 8800GT) 1.8GHz 14 7,168 (VRAM) 336GFlops 512MByte 57.6GByte/sec PC PCI-express 2.0 PC 8GByte/sec GPU GPU GPGPU[9] 2 GPU (2006 ) 3 (OpenGL Direct3D) NVidia C GPU (CUDA : Compute Unified Device Architecture) [10] GPU GPU GPU ( 2) CPU (main) bus bridge PCI e (8GB/sec) VRAM ( ) GPU RAM ( ) 1.Copy input data from RAM to GPU 2.Copy sub input data from VRAM to each GPU core 3. Execution, copy sub result to VRAM 4. Copy result to RAM 2: GPU 1. GPU 2. GPU GPU (CUDA ) 2 [11][12] 48

3. GPU GPU 4. GPU GPU GPU PC 1 GPU 8.0GByte/sec PC CPU 10.6GByte/sec GPU CPU 14 7,000 GPU 300GFlops PC CPU 7 3 GPU CPU 4 GPU 3 1. 1 1 2 2 1 1 ( 3(a)) 2. 1. ( 3(b)) 2 3. Harris Corner Detector CUDA 1.1 4 Microsoft Windows ( 4.5 )GPU CUDA NVidia GeForce 8800GT 4.1 1 1 4.1.1 I in (x, y, c) I out (x, y, c) 3 2010 2 GPU 1TFlops 4 2010 2 CUDA 2.3 49

(b) From RAM Input data : I I(1) I(2) I(3) I(4) I(5) (a) From RAM Processing on GPU proc proc proc proc proc Input data : I I(1) I(2) I(3) I(4) I(5) Sub Result : S S(1) S(2) S(3) S(4) S(5) Processing on GPU proc proc proc proc proc Merging exclusive operation exclusive operation Result data : R R(1) R(2) R(3) R(4) R(5) To RAM Result Dat a : R R(1) R(2) R(3) To RAM 3: RGB YUV HSI HSV rgb GPU RGB Sobel Prewitt LoG Sobel Prewitt 4.1.2 I C Filter(I,C) = I C = IFFT(FFT(I) FFT(C)) CUDA FFT (CUDAFFT) 4.1.3 2 2 2 2 Harris Corner Detector 2 2 50

4.2 4.2.1 ( 4) Grayscale Input: I I(1) I(2) I(3) I(4) I(5) proc proc proc proc proc Vot ing exclusive operation AtomicAdd() Operation H(1) H(2) H(3) Result Dat a : R R(1) R(2) R(3) 4: CUDA ( ) AtomicAdd 4.2.2 H1 H2 HIN(H1,H2) D1 D2 CORR(D1,D2) HIN(H1,H2) = i max i=1 min(h1(i),h2(i)) CORR(D1,D2) i = (D1 i D1)(D2 i D2) i (D1 i D1) 2 i (D2 i D2) 2 51

4.3 4.3.1 Harris Corner Detector Harris Corder Detector Harris Corner Detector ( 5) Input image Red points : Corner 1. C I 5: Harris Corner Detector 2. (I xx =( I x )2 ) (I yy =( I y )2 ) (I xy = I x y ) 3. Gaussian (A = G I xx ), (B = G I yy ), (C = G I xy ) ( ) Ai C 4. H i = i λ Ci Bi 1,λ 2 5. M i = λ 1 λ 2 α (λ 1 + λ 2 ) 2 1 1 GPU I 4.4 GPU 7168 5 640 480 7168 43 5 GPU 52

GPU GPU GPU CPU GPU1 ( 6) CPU CPU (Multi core) OpenMP SMP bus bridge VRAM ( ) GPU 1 : CPU core 1 GPU 2 : CPU core2 RAM ( ) VRAM ( ) 6: GPU 2 GPU 14336 22 4.5 GPU CPU C GPU CUDA Microsoft Windows DLL(Dynamic Link Library) DLL C Matlab 6 5 GPU 4 (256 256, 512 512, 1024 1024, 2048 2048) RGB HSV Sobel 7 7 Gaussian 256 6 Matlab DLL (loadlibrary ) 53

2 2 4 (256 256, 5120 512, 1024 1024, 2048 2048) 3 (Matlab (CPU ) GPU GPU2 (CUDA ))) GPU Matlab DLL GPU2 CPU GPU GPU ( 2 5 ) Matlab CPU Intel Core2 Quad Q6600 ( 40GFlops) 4GByte CPU 10.6GByte/sec GPU 8GByte/sec(PCI-Express) CPU 2 5 2: ( : 256 256) :msec CPU GPU GPU 2 44.31 9.620 6.941 42.49 16.44 13.92 1.988 6.922 7.092 17.19 (9.302) 0.012 0.978 (0.651) 8.070 3.482 (3.140) 158.0 17.47 11.21 551.3 61.48 53.10 3: ( : 512 512) :msec CPU GPU GPU 2 187.6 30.27 19.79 198.1 63.98 51.60 17.31 13.73 26.34 68.74 (45.70) 0.013 1.292 (0.716) 18.09 8.660 (8.547) 625.8 55.45 34.93 2295 163.2 130.9 54

4: ( : 1024 1024) :msec CPU GPU GPU 2 774.7 132.9 79.65 862.1 259.0 213.6 89.84 47.85 103.4 274.8 (168.8) 0.014 1.350 (0.7697) 60.83 30.29 (28.65) 2529 199.0 124.3 9062 547.0 447.5 5: ( : 2048 2048) :msec CPU GPU GPU 2 3033 398.3 336.4 3598 1027 851.9 399.0 224.0 429.5 1103 (568.6) 0.016 1.364 (0.7170) 232.1 117.0 (115.4) 10163 793.6 484.3 36405 2284 1853 6 256 256 GPU GPU CPU 2 1. GPU CPU GPU 2. GPU GPU CPU CPU 16 GPU1 GPU2 1.67 GPU ( ) 1.15 2 GPU 55

6.1 GPU GPU CPU GPU GPU GPU CPU GPU GPU CPU GPU GPU 7 GPU GPU DLL Harris Corner Detector 512 512 GPU GPU PC [1] 35 5 pp.582-587 2006 [2] 6 H pp.17-20 2007 [3] 2007 10 pp.53-57 2007 [4],,, 10 pp.1283-1288 2007 [5] A combined corner and edge detector C. Harris and M. Stephens Proceedings of the 4th Alvey Vision Conference pp.147-151 1988. [6] Intel Streaming SIMD Extensions 4 (SSE4) Instruction Set Intel Corp. http://www.intel.com/technology/ architecture-silicon/sse4-instructions/ 2007 [7] The OpenMP specification for parallel programming OpenMP Architecture Review Board http://www.openmp.org/ [8] Message Passing Interface Forum MPI Forum http://www.mpi-forum.org/ 56

[9] General-Purpose Computation Using Graphics Hardware http://www.gpgpu.org/ [10] NVIDIA CUDA Zone NVIDIA Corp. http://www.nvidia.com/object/cuda home.html 2007 [11] GPU-based implementation of the KLT Tracker, http://cs.unc.edu/ ssinha/research/gpu KLT/ [12] GPU-based implementation of Scale Invariant Feature Transform, http://cs.unc.edu/ ccwu/siftgpu/ 57