(MIRU2010) NTT Graphic Processor Unit GPU graphi

Similar documents
07-二村幸孝・出口大輔.indd

GPGPU

1 GPU GPGPU GPU CPU 2 GPU 2007 NVIDIA GPGPU CUDA[3] GPGPU CUDA GPGPU CUDA GPGPU GPU GPU GPU Graphics Processing Unit LSI LSI CPU ( ) DRAM GPU LSI GPU

258 5) GPS 1 GPS 6) GPS DP 7) 8) 10) GPS GPS ) GPS Global Positioning System

GPU.....

23 Fig. 2: hwmodulev2 3. Reconfigurable HPC 3.1 hw/sw hw/sw hw/sw FPGA PC FPGA PC FPGA HPC FPGA FPGA hw/sw hw/sw hw- Module FPGA hwmodule hw/sw FPGA h

IPSJ SIG Technical Report Vol.2012-CG-149 No.13 Vol.2012-CVIM-184 No /12/4 3 1,a) ( ) DB 3D DB 2D,,,, PnP(Perspective n-point), Ransa

main.dvi

( CUDA CUDA CUDA CUDA ( NVIDIA CUDA I

顔認識の為のリアルタイム特徴抽出

Slides: TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments

1 3DCG [2] 3DCG CG 3DCG [3] 3DCG 3 3 API 2 3DCG 3 (1) Saito [4] (a) 1920x1080 (b) 1280x720 (c) 640x360 (d) 320x G-Buffer Decaudin[5] G-Buffer D

(4) ω t(x) = 1 ω min Ω ( (I C (y))) min 0 < ω < C A C = 1 (5) ω (5) t transmission map tmap 1 4(a) t 4(a) t tmap RGB 2 (a) RGB (A), (B), (C)

Gaze Head Eye (a) deg (b) 45 deg (c) 9 deg 1: - 1(b) - [5], [6] [7] Stahl [8], [9] Fang [1], [11] Itti [12] Itti [13] [7] Fang [1],


(a) 1 (b) 3. Gilbert Pernicka[2] Treibitz Schechner[3] Narasimhan [4] Kim [5] Nayar [6] [7][8][9] 2. X X X [10] [11] L L t L s L = L t + L s

(3.6 ) (4.6 ) 2. [3], [6], [12] [7] [2], [5], [11] [14] [9] [8] [10] (1) Voodoo 3 : 3 Voodoo[1] 3 ( 3D ) (2) : Voodoo 3D (3) : 3D (Welc

xx/xx Vol. Jxx A No. xx 1 Fig. 1 PAL(Panoramic Annular Lens) PAL(Panoramic Annular Lens) PAL (2) PAL PAL 2 PAL 3 2 PAL 1 PAL 3 PAL PAL 2. 1 PAL

DEIM Forum 2012 E Web Extracting Modification of Objec

! 行行 CPUDSP PPESPECell/B.E. CPUGPU 行行 SIMD [SSE, AltiVec] 用 HPC CPUDSP PPESPE (Cell/B.E.) SPE CPUGPU GPU CPU DSP DSP PPE SPE SPE CPU DSP SPE 2

1(a) (b),(c) - [5], [6] Itti [12] [13] gaze eyeball head 2: [time] [7] Stahl [8], [9] Fang [1], [11] 3 -

IPSJ SIG Technical Report Vol.2013-ARC-203 No /2/1 SMYLE OpenCL (NEDO) IT FPGA SMYLEref SMYLE OpenCL SMYLE OpenCL FPGA 1

rank ”«‘‚“™z‡Ì GPU ‡É‡æ‡éŁÀŠñ›»

IPSJ SIG Technical Report Vol.2013-CE-122 No.16 Vol.2013-CLE-11 No /12/14 Android 1,a) 1 1 GPS LAN 2 LAN Android,,, Android, HTML5 LAN 1. ICT(I

IPSJ SIG Technical Report Vol.2013-HPC-138 No /2/21 GPU CRS 1,a) 2,b) SpMV GPU CRS SpMV GPU NVIDIA Kepler CUDA5.0 Fermi GPU Kepler Kepler Tesla

3 2 2 (1) (2) (3) (4) 4 4 AdaBoost 2. [11] Onishi&Yoda [8] Iwashita&Stoica [5] 4 [3] 3. 3 (1) (2) (3)

Microsoft PowerPoint - SSII_harada pptx

Fuzzy Multiple Discrimminant Analysis (FMDA) 5) (SOM) 6) SOM 3 6) SOM SOM SOM SOM SOM SOM 7) 8) SOM SOM SOM GPU 2. n k f(x) m g(x) (1) 12) { min(max)

& Vol.5 No (Oct. 2015) TV 1,2,a) , Augmented TV TV AR Augmented Reality 3DCG TV Estimation of TV Screen Position and Ro

1 Fig. 1 Extraction of motion,.,,, 4,,, 3., 1, 2. 2.,. CHLAC,. 2.1,. (256 ).,., CHLAC. CHLAC, HLAC. 2.3 (HLAC ) r,.,. HLAC. N. 2 HLAC Fig. 2

EGunGPU

1 Kinect for Windows M = [X Y Z] T M = [X Y Z ] T f (u,v) w 3.2 [11] [7] u = f X +u Z 0 δ u (X,Y,Z ) (5) v = f Y Z +v 0 δ v (X,Y,Z ) (6) w = Z +

2010 : M DCG 3 (3DCG) 3DCG 3DCG 3DCG S

(MIRU2008) HOG Histograms of Oriented Gradients (HOG)

2). 3) 4) 1.2 NICTNICT DCRA Dihedral Corner Reflector micro-arraysdcra DCRA DCRA DCRA 3D DCRA PC USB PC PC ON / OFF Velleman K8055 K8055 K8055

GPU n Graphics Processing Unit CG CAD

[6] DoN DoN DDoN(Donuts DoN) DoN 4(2) DoN DDoN 3.2 RDoN(Ring DoN) 4(1) DoN 4(3) DoN RDoN 2 DoN 2.2 DoN PCA DoN DoN 2 DoN PCA 0 DoN 3. DoN

熊本大学学術リポジトリ Kumamoto University Repositor Title GPGPU による高速演算について Author(s) 榎本, 昌一 Citation Issue date Type URL Presentation

,,.,.,,.,.,.,.,,.,..,,,, i

2007/8 Vol. J90 D No. 8 Stauffer [7] 2 2 I 1 I 2 2 (I 1(x),I 2(x)) 2 [13] I 2 = CI 1 (C >0) (I 1,I 2) (I 1,I 2) Field Monitoring Server

IPSJ SIG Technical Report Vol.2010-GN-74 No /1/ , 3 Disaster Training Supporting System Based on Electronic Triage HIROAKI KOJIMA, 1 KU

IPSJ SIG Technical Report Vol.2015-MUS-107 No /5/23 HARK-Binaural Raspberry Pi 2 1,a) ( ) HARK 2 HARK-Binaural A/D Raspberry Pi 2 1.

fiš„v8.dvi

WebGL OpenGL GLSL Kageyama (Kobe Univ.) Visualization / 57

1. GPU コンピューティング GPU コンピューティング GPUによる 汎用コンピューティング GPU = Graphics Processing Unit CUDA Compute Unified Device Architecture NVIDIA の GPU コンピューティング環境 Lin

[1] SBS [2] SBS Random Forests[3] Random Forests ii

Slide 1

Input image Initialize variables Loop for period of oscillation Update height map Make shade image Change property of image Output image Change time L

Convolutional Neural Network A Graduation Thesis of College of Engineering, Chubu University Investigation of feature extraction by Convolution

ActionScript Flash Player 8 ActionScript3.0 ActionScript Flash Video ActionScript.swf swf FlashPlayer AVM(Actionscript Virtual Machine) Windows

Vol. 23 No. 4 Oct Kitchen of the Future 1 Kitchen of the Future 1 1 Kitchen of the Future LCD [7], [8] (Kitchen of the Future ) WWW [7], [3


Microsoft PowerPoint - GPU_computing_2013_01.pptx

GPU GPU CPU CPU CPU GPU GPU N N CPU ( ) 1 GPU CPU GPU 2D 3D CPU GPU GPU GPGPU GPGPU 2 nvidia GPU CUDA 3 GPU 3.1 GPU Core 1

3D UbiCode (Ubiquitous+Code) RFID ResBe (Remote entertainment space Behavior evaluation) 2 UbiCode Fig. 2 UbiCode 2. UbiCode 2. 1 UbiCode UbiCode 2. 2

CUDA を用いた画像処理 画像処理を CUDA で並列化 基本的な並列化の考え方 目標 : 妥当な Naïve コードが書ける 最適化の初歩がわかる ブロックサイズ メモリアクセスパターン

: Name, Tel name tel (! ) name : Name! Tel tel ( % ) 3. HTML. : Name % Tel name tel 2. 2,., [ ]!, [ ]!, [ ]!,. [! [, ]! ]!,,. ( [ ], ),. : [Name], nam

64bit SSE2 SSE2 FPU Visual C++ 64bit Inline Assembler 4 FPU SSE2 4.1 FPU Control Word FPU 16bit R R R IC RC(2) PC(2) R R PM UM OM ZM DM IM R: reserved

indd

Fig. 3 Flow diagram of image processing. Black rectangle in the photo indicates the processing area (128 x 32 pixels).

& 3 3 ' ' (., (Pixel), (Light Intensity) (Random Variable). (Joint Probability). V., V = {,,, V }. i x i x = (x, x,, x V ) T. x i i (State Variable),

Transcription:

(MIRU2010) 2010 7 889 2192 1-1 905 2171 905 NTT 243 0124 3-1 E-mail: ac094608@edu.okinawa-ct.ac.jp, akisato@ieee.org Graphic Processor Unit GPU graphic processor unit CUDA Fully automatic extraction of salient objects in near real-time Kazuma AKAMINE, Ken FUKUCHI, Akisato KIMURA, and Shigeru TAKAGI Department of Computer Science and System Engineering, Faculty of Engineering, Miyazaki University 1-1 Gakuen Kibanadai-Nishi, Miyazaki, 889 2192 Japan. Department of Information and Communication Systems Engineering, Okinawa National College of Technology Henoko 905, Nago, Okinawa, 905 2171 Japan. NTT Communication Science Laboratories, NTT Corporation Morinosato Wakamiya 3-1, Atsugi, Kanagawa, 243 0198 Japan. E-mail: ac094608@edu.okinawa-ct.ac.jp, akisato@ieee.org 1. (MRF) (MAP) MRF Greig [1] Boykov Interactive Graph Cuts [2] Kohli Dynamic Graph Cuts [3] MRF MAP [4] [5] Fu [6] [7]

Graphic Processor Unit GPU SIMD Single Instruction Multiple Data GPU CPU [8] [9] [10] GPU general-purpose GPU(GPGPU) [11] GPU C++ CUDA [12] OpenCL [13] GPU GPU CUDA GPU CUDA 2 GPU CUDA 3 [7] 4 GPU 5 6 2. Graphic Processor Unit GPU GPU CUDA 2. 1 GPU(Graphics Processing Unit) GPU GPU SIMD 1 1 2 GPU GPGPU GPU GPU NVIDIA CUDA [12] AMD ATi Stream [14] CPU GPU OpenCL [13] Microsoft API Direct Compute [15] 2. 2 CUDA GPU HLSL [15] GLSL [16] NVIDIA GPU CUDA GPU CUDA C GPU C GPU 1 ID ID 32 32 2 CPU GPU GPU CPU GPU CPU GPU 2. 3 CUDA CUDA CPU GPU

2 CPU GPU 4 CUDA 1 host device int add(int a, int b) { 2 return a + b; 3 } 4 global void addall(int length, int a, int b, int result) { 5 int px = blockidx.x blockdim.x + threadidx; 6 if(px < length) { 7 result[px] = add(a[px], b[px]); 8 } 9 } 10 void main() { 11 12 Dim3 blocks = {10, 0, 0}; 13 Dim3 threads = {32, 0, 0}; 14 int sm cap = 0; 15 addeach<<<blocks, threads, sm cap>>>(length, a, b, result); 16 17 } 3 CUDA CPU GPU 1 1 CUDA host CPU CPU global CPU GPU device GPU GPU GPU device global void host device 3 3 GPU a b addall GPU add CPU GPU GPU addall <<<>>> 3 blocks 5 2 GPU CPU threads 1 sm cap 2. 4 CUDA CUDA GPU 4 5 6 2 CUDA GPU 6 2 16KB syncthreads()

6 Saliency Graph Cuts 7 CPU GPU GPU 64KB GPU CPU GPU GPU CPU GPU CPU GPU CPU GPU 3. SGC [7] SGC 6 SGC ( 6(a)) [17] ( 6(b)) (Lucas-Kanade [18]) [19] / ( 6(c1))P r (O) P r (B) p P r (O; p) P r (O; p) 0 P r (B; p) = 1 P r (O; p) Boykov [20] 7 S T t-link n-link t-link (1)(2) n-link (5) C p p P r (C p O) P r (C p B) P r (O C p ) P r (B C p ) I p I q p q dist(p, q) p q R p ( obj ) = ln P r (O C p ) (1) R p ( bkg ) = ln P r (B C p ) (2) P r (O C p ) = P r(c p O)P r (O; p) P r (C p ) P r (B C p ) = P r(c p B)P r (B; p) P r (C p ) B {p,q} exp { (I p I q ) 2 } 1 2σ dist(p, q) (3) (4) (5) ( 6(c2)) (6)(8) (GMM) GMM d RGB 3 a k S k π k EM P r (C p O)

F (x, y) P (x, y) n m F k (i, j) (9) F (x, y) = n m F k (i, j) i=0 j=0 P (x + i n 2, y + j m 2 ) (9) 8 GPU n P r (O) RGB P r (C p B) P r (O) RGB p(x; a k, S k, π k ) = π k > = 0, M π k p k (x) (6) k=1 M π k = 1 (7) k=1 1 p k (x) = (2π) d/2 S k 1/2 { exp 1 } 2 (x a k) T S 1 k (x a k) (8) ( 6(d)) 1 4. GPU 3 GPU 8 GPU t-link n-link GPU GPU 1 1 P (x, y) F k (i, j) F (x, y) F k (i, j) P (x, y) 9 1-2 (filter) (src1) height width fheight fwidth result 1 10 10 minsrc maxsrc syncthreads 0 1 P r (O) GPU t-link EM CUDA ZONE [12] Harp [21] EM k-means CPU GMM (6) 9 P k (x) (8) {(2π) d/2 S k 1/2 } 1 1 CPU GPU

1 texture<float, 1, cudareadmodeelementtype> filter; 2 texture<float, 1, cudareadmodeelementtype> src1; 3 device float Filter2DCore(texture<float, 1, cudareadmodeelementtype> fsource, int x, int y, int height, int width, int filtersizex, int filtersizey) { 4 float sum = 0; 5 x = filtersizex/2; 6 y = filtersizey/2; 7 for(int fy = 0; fy < filtersizey; fy++) { 8 int by = y + fy; 9 if(by > 0 && by < height) { 10 by = width; 11 for(int fx = 0; fx < filtersizex; fx++) { 12 int bx = x + fx; 13 if(bx > 0 && bx < width) { 14 sum += tex1dfetch(filter, fy filtersizex + fx) tex1dfetch(fsource, by + bx); 15 } 16 } 17 } 18 } 19 return sum; 20 } 21 global void Filter2DKernel(int height, int width, int fheight, int fwidth, float result) { 22 int px = blockdim.x blockidx.x + threadidx.x; 23 if(px < height width) { 24 int x = px%width; 25 int y = px/width; 26 result[px] = Filter2DCore(src1, x, y, height, width, fwidth, fheight); 27 } 28 } 1 texture<float, 1, cudareadmodeelementtype> minsrc; 2 texture<float, 1, cudareadmodeelementtype> maxsrc; 3 global void SMRangeNormalizeKernel1(int length, float localmin, float localmax) { 4 int px = blockdim.x blockidx.x + threadidx.x; 5 6 shared float mini[32], maxi[32]; 7 if(px < length) { 8 mini[threadidx.x] = tex1dfetch(minsrc, px); 9 maxi[threadidx.x] = tex1dfetch(maxsrc, px); 10 } else { 11 mini[threadidx.x] = FLT MAX; 12 maxi[threadidx.x] = FLT MIN; 13 } 14 syncthreads(); 15 if(threadidx.x == 0) { 16 for(int i = 1; i < blockdim.x; i++) { 17 mini[0] = min(mini[0], mini[i]); 18 maxi[0] = max(maxi[0], maxi[i]); 19 } 20 localmin[blockidx.x] = mini[0]; 21 localmax[blockidx.x] = maxi[0]; 22 } 23 } 10 9 CUDA (8) exp { 1 2 (x a k) T S 1 k (x a k) } a k S 1 k n-link 2 1 GPU CUDA ZONE Vineet CUDA Cuts [22] 1 5. 10 10 CPU GPU 3 352 288 480 384 640 512 3 1 11 1 640x512 GPU 11 4 11 4 GPU 12 5 3 CPU Intel Core2Quad Q9550 4GB GPU NVIDIA Geforce 9800GT 512MB OS Windows XP Professional NVIDIA CUDA 2.1 OpenCV 1.1 4 640x512 2.4 132 t-link 4.2 n-link 11 2.9

352x288 2.1 640x512 4.5 CUDA ZONE EM OpenCL!"! 12 4 1 1 [ms] t- - link link 352 CPU 32.9 148.1 218.6 9.3 97.0 71.0 288 GPU 22.2 1.9 109.6 0.9 69.0 65.6 480 CPU 58.8 372.8 350.8 16.7 246.5 86.4 384 GPU 30.4 3.5 120.8 1.6 127.7 74.6 640 CPU 109.8 814.5 602.6 29.5 664.5 112.7 512 GPU 45.2 6.2 142.6 2.7 232.3 87.1 5 [us] CPU GPU 352 288 5.69 2.66 2.14 480 384 6.14 1.95 3.16 640 512 7.12 1.57 4.52 4.5 5 CPU GPU GPU CPU GPU 2 GPU GPU SGC 11 4 GPU t-link t-link EM 2 CUDAZONE 6. GPU [1] D.Greig B.Porteous and A.Seheuit Exact maximum a posteriori estimation for binary images Royalstat Vol.B:51 No.2 pp.271 279 1989. [2] Y.Boykov and M-P.Jolly Interactive Graph Cuts for Optical Boundary & Region Segmentation of Objects in N-D Images Proc.ICCV Vol.I pp.105 112 2001. [3] P.Kohli and P.Torr Dynamic graph cuts for efficient inference in Markov random fields IEEE Trans.PAMI Vol.29 No.12 pp.2079 2088 2007. [4] AdaBoost Saliency Map Graph Cuts (MIRU2008) IS3-33 pp.796 801 2008 7. [5] PRMU2008 232 pp.145 150 2009 2. [6] Y.Fu J.Cheng Z.Li and H.Lu Saliency cuts: Anautomatic approach to object segmentation Proc.ICPR 2008. [7] (MIRU2009) 2009. [8] TSUBAME http://www.gsic.titech.ac.jp/ ccwww/ tebiki/tesla/tesla.html [9] TMPGEnc 4.0 XPress http://tmpgenc.pegasys-inc.com/ja/press/08 081030.html [10] MediaShow Espresso http://jp.cyberlink.com/prog/ company/press-news-content.do?pid=2115 [11] GPU GPGPU 15 2009 6. [12] http://www.nvidia.com/object/cuda home new.html [13] http://www.khronos.org/opencl/ [14] http://www.amd.com/us/products/technologies/ stream-technology/pages/stream-technology.aspx [15] http://msdn.microsoft.com/en-us/library/ ee663301(vs.85).aspx [16] http://www.opengl.org/documentation/glsl/ [17] MCMC-based particle filter (MIRU2009) 2009 7. [18] B.D.Lucas and T.Kanade An Iterative Image Registration Technique with an Application to Stereo Vision Proceedings of the 7th International Joint Conference on Artificial Intelligence(IJCAI 81) pp.674 679 August 1981. [19] L.Itti C.Koch and E.Niebur A model of saliencybased visual attention for rapid scene analysis IEEE Trans.PAMI Vol.20 No.11 pp.1254 1259 November 1998. [20] Y.Boykov and G.F.Lea Graph cuts and efficient N- D image segmentation Proc.ICCV Vol.70 No.2 pp.109 131 2006.

[21] A.Harp Computational Statistics via GPU http://andrewharp.com/gmmcuda [22] V.Vineet and P.J.Narayanan Cuda Cuts: Fast Graph Cuts on the GPU CVPR Workshop on Visual Computer Vision on GPUs 2008.