HPC pdf

Similar documents
GPGPU

1 OpenCL OpenCL 1 OpenCL GPU ( ) 1 OpenCL Compute Units Elements OpenCL OpenCL SPMD (Single-Program, Multiple-Data) SPMD OpenCL work-item work-group N

07-二村幸孝・出口大輔.indd

main.dvi

23 Fig. 2: hwmodulev2 3. Reconfigurable HPC 3.1 hw/sw hw/sw hw/sw FPGA PC FPGA PC FPGA HPC FPGA FPGA hw/sw hw/sw hw- Module FPGA hwmodule hw/sw FPGA h

258 5) GPS 1 GPS 6) GPS DP 7) 8) 10) GPS GPS ) GPS Global Positioning System

IPSJ SIG Technical Report Vol.2017-ARC-225 No.12 Vol.2017-SLDM-179 No.12 Vol.2017-EMB-44 No /3/9 1 1 RTOS DefensiveZone DefensiveZone MPU RTOS

IPSJ SIG Technical Report Vol.2013-ARC-203 No /2/1 SMYLE OpenCL (NEDO) IT FPGA SMYLEref SMYLE OpenCL SMYLE OpenCL FPGA 1

1 GPU GPGPU GPU CPU 2 GPU 2007 NVIDIA GPGPU CUDA[3] GPGPU CUDA GPGPU CUDA GPGPU GPU GPU GPU Graphics Processing Unit LSI LSI CPU ( ) DRAM GPU LSI GPU

Input image Initialize variables Loop for period of oscillation Update height map Make shade image Change property of image Output image Change time L

IEEE HDD RAID MPI MPU/CPU GPGPU GPU cm I m cm /g I I n/ cm 2 s X n/ cm s cm g/cm

1 Table 1: Identification by color of voxel Voxel Mode of expression Nothing Other 1 Orange 2 Blue 3 Yellow 4 SSL Humanoid SSL-Vision 3 3 [, 21] 8 325

,4) 1 P% P%P=2.5 5%!%! (1) = (2) l l Figure 1 A compilation flow of the proposing sampling based architecture simulation

2. CABAC CABAC CABAC 1 1 CABAC Figure 1 Overview of CABAC 2 DCT 2 0/ /1 CABAC [3] 3. 2 値化部 コンテキスト計算部 2 値算術符号化部 CABAC CABAC

(3.6 ) (4.6 ) 2. [3], [6], [12] [7] [2], [5], [11] [14] [9] [8] [10] (1) Voodoo 3 : 3 Voodoo[1] 3 ( 3D ) (2) : Voodoo 3D (3) : 3D (Welc

28 Horizontal angle correction using straight line detection in an equirectangular image

IPSJ SIG Technical Report Vol.2014-CG-155 No /6/28 1,a) 1,2,3 1 3,4 CG An Interpolation Method of Different Flow Fields using Polar Inter

2017 (413812)

10D16.dvi

untitled

6 2. AUTOSAR 2.1 AUTOSAR AUTOSAR ECU OSEK/VDX 3) OSEK/VDX OS AUTOSAR AUTOSAR ECU AUTOSAR 1 AUTOSAR BSW (Basic Software) (Runtime Environment) Applicat

& Vol.5 No (Oct. 2015) TV 1,2,a) , Augmented TV TV AR Augmented Reality 3DCG TV Estimation of TV Screen Position and Ro

IPSJ SIG Technical Report iphone iphone,,., OpenGl ES 2.0 GLSL(OpenGL Shading Language), iphone GPGPU(General-Purpose Computing on Graphics Proc

1 Fig. 1 Extraction of motion,.,,, 4,,, 3., 1, 2. 2.,. CHLAC,. 2.1,. (256 ).,., CHLAC. CHLAC, HLAC. 2.3 (HLAC ) r,.,. HLAC. N. 2 HLAC Fig. 2

untitled

IPSJ SIG Technical Report Vol.2011-MUS-91 No /7/ , 3 1 Design and Implementation on a System for Learning Songs by Presenting Musical St

EGunGPU

Microsoft PowerPoint - GPU_computing_2013_01.pptx

IPSJ SIG Technical Report Vol.2012-CG-148 No /8/29 3DCG 1,a) On rigid body animation taking into account the 3D computer graphics came

ID 3) 9 4) 5) ID 2 ID 2 ID 2 Bluetooth ID 2 SRCid1 DSTid2 2 id1 id2 ID SRC DST SRC 2 2 ID 2 2 QR 6) 8) 6) QR QR QR QR

Shonan Institute of Technology MEMOIRS OF SHONAN INSTITUTE OF TECHNOLOGY Vol. 41, No. 1, 2007 Ships1 * ** ** ** Development of a Small-Mid Range Paral

xx/xx Vol. Jxx A No. xx 1 Fig. 1 PAL(Panoramic Annular Lens) PAL(Panoramic Annular Lens) PAL (2) PAL PAL 2 PAL 3 2 PAL 1 PAL 3 PAL PAL 2. 1 PAL

1 3DCG [2] 3DCG CG 3DCG [3] 3DCG 3 3 API 2 3DCG 3 (1) Saito [4] (a) 1920x1080 (b) 1280x720 (c) 640x360 (d) 320x G-Buffer Decaudin[5] G-Buffer D

GPU n Graphics Processing Unit CG CAD

IPSJ SIG Technical Report Vol.2012-HCI-149 No /7/20 1 1,2 1 (HMD: Head Mounted Display) HMD HMD,,,, An Information Presentation Method for Weara

1 Web [2] Web [3] [4] [5], [6] [7] [8] S.W. [9] 3. MeetingShelf Web MeetingShelf MeetingShelf (1) (2) (3) (4) (5) Web MeetingShelf

2 HI LO ZDD 2 ZDD 2 HI LO 2 ( ) HI (Zero-suppress ) Zero-suppress ZDD ZDD Zero-suppress 1 ZDD abc a HI b c b Zero-suppress b ZDD ZDD 5) ZDD F 1 F = a

CT 1201 CT 5 CT CT CT CT CT 1. 方法 CT 1-1 CT CT Fig. 1 CT X 5 μm 100 kv 100 μa X L μm CsI CMOS Fig. 1 Educational cone-beam CT system.

! 行行 CPUDSP PPESPECell/B.E. CPUGPU 行行 SIMD [SSE, AltiVec] 用 HPC CPUDSP PPESPE (Cell/B.E.) SPE CPUGPU GPU CPU DSP DSP PPE SPE SPE CPU DSP SPE 2

IPSJ SIG Technical Report Vol.2015-CVIM-196 No /3/6 1,a) 1,b) 1,c) U,,,, The Camera Position Alignment on a Gimbal Head for Fixed Viewpoint Swi

i IHE IHE-J HIS RIS PACS CT CT CT

[2] OCR [3], [4] [5] [6] [4], [7] [8], [9] 1 [10] Fig. 1 Current arrangement and size of ruby. 2 Fig. 2 Typography combined with printing

知能と情報, Vol.30, No.5, pp

IPSJ SIG Technical Report Vol.2014-ARC-213 No.24 Vol.2014-HPC-147 No /12/10 GPU 1,a) 1,b) 1,c) 1,d) GPU GPU Structure Of Array Array Of

Journal of Geography 116 (6) Configuration of Rapid Digital Mapping System Using Tablet PC and its Application to Obtaining Ground Truth

6_27.dvi

2). 3) 4) 1.2 NICTNICT DCRA Dihedral Corner Reflector micro-arraysdcra DCRA DCRA DCRA 3D DCRA PC USB PC PC ON / OFF Velleman K8055 K8055 K8055

(a) 1 (b) 3. Gilbert Pernicka[2] Treibitz Schechner[3] Narasimhan [4] Kim [5] Nayar [6] [7][8][9] 2. X X X [10] [11] L L t L s L = L t + L s

211 年ハイパフォーマンスコンピューティングと計算科学シンポジウム Computing Symposium 211 HPCS /1/18 a a 1 a 2 a 3 a a GPU Graphics Processing Unit GPU CPU GPU GPGPU G

THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE.

IPSJ SIG Technical Report Vol.2013-GN-87 No /3/ Research of a surround-sound field adjustmen system based on loudspeakers arrangement Ak

1_26.dvi

IPSJ SIG Technical Report Vol.2009-DPS-141 No.23 Vol.2009-GN-73 No.23 Vol.2009-EIP-46 No /11/27 t-room t-room 2 Development of

(MIRU2008) HOG Histograms of Oriented Gradients (HOG)

第62巻 第1号 平成24年4月/石こうを用いた木材ペレット

CPU Levels in the memory hierarchy Level 1 Level 2... Increasing distance from the CPU in access time Level n Size of the memory at each level 1: 2.2

IPSJ SIG Technical Report Vol.2013-ARC-206 No /8/1 Android Dominic Hillenbrand ODROID-X2 GPIO Android OSCAR WFI 500[us] GPIO GP

Fig. 3 Flow diagram of image processing. Black rectangle in the photo indicates the processing area (128 x 32 pixels).

パナソニック技報

[2] 2. [3 5] 3D [6 8] Morishima [9] N n 24 24FPS k k = 1, 2,..., N i i = 1, 2,..., n Algorithm 1 N io user-specified number of inbetween omis

Vol.11-HCI-15 No. 11//1 Xangle 5 Xangle 7. 5 Ubi-WA Finger-Mount 9 Digitrack 11 1 Fig. 1 Pointing operations with our method Xangle Xa

14 2 5

季報2010C_P _3-3.indd

IPSJ SIG Technical Report Vol.2012-IS-119 No /3/ Web A Multi-story e-picture Book with the Degree-of-interest Extraction Function

日本感性工学会論文誌

Vol.53 No (Mar. 2012) 1, 1,a) 1, 2 1 1, , Musical Interaction System Based on Stage Metaphor Seiko Myojin 1, 1,a

Fig. 3 3 Types considered when detecting pattern violations 9)12) 8)9) 2 5 methodx close C Java C Java 3 Java 1 JDT Core 7) ) S P S

Vol.55 No (Jan. 2014) saccess 6 saccess 7 saccess 2. [3] p.33 * B (A) (B) (C) (D) (E) (F) *1 [3], [4] Web PDF a m

Vol. 42 No. SIG 8(TOD 10) July HTML 100 Development of Authoring and Delivery System for Synchronized Contents and Experiment on High Spe

,,.,.,,.,.,.,.,,.,..,,,, i

2) TA Hercules CAA 5 [6], [7] CAA BOSS [8] 2. C II C. ( 1 ) C. ( 2 ). ( 3 ) 100. ( 4 ) () HTML NFS Hercules ( )

IPSJ SIG Technical Report Vol.2014-GN-90 No.16 Vol.2014-CDS-9 No.16 Vol.2014-DCC-6 No /1/24 1,a) 2,b) 2,c) 1,d) QUMARION QUMARION Kinect Kinect

P2P P2P peer peer P2P peer P2P peer P2P i

IPSJ SIG Technical Report 1, Instrument Separation in Reverberant Environments Using Crystal Microphone Arrays Nobutaka ITO, 1, 2 Yu KITANO, 1

A Feasibility Study of Direct-Mapping-Type Parallel Processing Method to Solve Linear Equations in Load Flow Calculations Hiroaki Inayoshi, Non-member

Web Stamps 96 KJ Stamps Web Vol 8, No 1, 2004

,,,,., C Java,,.,,.,., ,,.,, i

Microsoft Word - 0_0_表紙.doc

Vol1-CVIM-172 No.7 21/5/ Shan 1) 2 2)3) Yuan 4) Ancuti 5) Agrawal 6) 2.4 Ben-Ezra 7)8) Raskar 9) Image domain Blur image l PSF b / = F(

IPSJ SIG Technical Report Secret Tap Secret Tap Secret Flick 1 An Examination of Icon-based User Authentication Method Using Flick Input for

06_学術_関節単純X線画像における_1c_梅木様.indd

ipod touch 1 2 Apple ipod touch ipod touch 3 ( ) ipod touch ( 1 ) Apple ( 2 ) Web 1),2) 3. ipod touch 1 2 ipod touch x y z i

paper.dvi

2) 2. DLNA DLNA (Version 1.5) 2 (DMC1) (SSDP) (DMS1, DMS2) (DMR1, DMR2, DMR3) (UDP) DMC1 3 DMS2 DMC1 DMS1 (HTTP) DMS1 DMR2 (RTP) DMR2 3. DLNA 4 DMC1 D

IPSJ SIG Technical Report Vol.2011-EC-19 No /3/ ,.,., Peg-Scope Viewer,,.,,,,. Utilization of Watching Logs for Support of Multi-

IPSJ SIG Technical Report Vol.2011-IOT-12 No /3/ , 6 Construction and Operation of Large Scale Web Contents Distribution Platfo

4. C i k = 2 k-means C 1 i, C 2 i 5. C i x i p [ f(θ i ; x) = (2π) p 2 Vi 1 2 exp (x µ ] i) t V 1 i (x µ i ) 2 BIC BIC = 2 log L( ˆθ i ; x i C i ) + q

(Visual Secret Sharing Scheme) VSSS VSSS 3 i

12) NP 2 MCI MCI 1 START Simple Triage And Rapid Treatment 3) START MCI c 2010 Information Processing Society of Japan

2 ( ) i

it-ken_open.key

A Study on Practical Use of Artificial Intelligence. The purpose of this research paper is to demonstrate the ease of using artificial intelligence in

Silhouette on Image Object Silhouette on Images Object 1 Fig. 1 Visual cone Fig. 2 2 Volume intersection method Fig. 3 3 Background subtraction Fig. 4

DS0 0/9/ a b c d u t (a) (b) (c) (d) [].,., Del Barrio [], Pilato [], [].,,. [],.,.,,.,.,,.,, 0%,..,,, 0,.,.,. (variable-latency unit)., (a) ( DFG ).,

2006 [3] Scratch Squeak PEN [4] PenFlowchart 2 3 PenFlowchart 4 PenFlowchart PEN xdncl PEN [5] PEN xdncl DNCL 1 1 [6] 1 PEN Fig. 1 The PEN


2. Eades 1) Kamada-Kawai 7) Fruchterman 2) 6) ACE 8) HDE 9) Kruskal MDS 13) 11) Kruskal AGI Active Graph Interface 3) Kruskal 5) Kruskal 4) 3. Kruskal


Slides: TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments

Accuracy check of grading of XCT Report Accuracy check of grading and calibration of CT value on the micro-focus XCT system Tetsuro Hirono Masahiro Ni

Transcription:

GPU 1 1 2 2 1 1024 3 GPUGraphics Unit1024 3 GPU GPU GPU GPU 1024 3 Tesla S1070-400 1 GPU 2.6 Accelerating Out-of-core Cone Beam Reconstruction Using GPU Yusuke Okitsu, 1 Fumihiko Ino, 1 Taketo Kishi, 2 Syuhei Ohnishi 2 and Kenichi Hagihara 1 This paper presents an acceleration method of cone beam reconstruction for large-scale volume using graphics processing unit (GPU). The proposed method reconstructs subvolumes using multi-gpu environments because such a largescale volume exceeds the amount of video memory. With respect to the parallelization scheme for multi-gpu environments, there can be two parallelization approaches, projection parallelism or voxel parallelism. Projection parallelism scheme reconstructs the entire volume from a subset of projections on each GPU. On the other hand, voxel parallelism scheme reconstructs subvolume from all projections on each GPU. Experimental results demonstrate that our method can perform out-of-core reconstruction for 1024 3 -voxel volume. In addition, the performance of NVIDIA Tesla S1070-400 is 2.6 times faster than that of single GPU. 1. CTComputed Tomography CT CT X 1024 3 CPU GPUGraphics UnitGPU GPU GPGPU 1) General Purpose Computation on GPUs nvidia CUDA 2) Compute Unified Device Architecture C GPU 3) 5) FDKFeldkampDavisKress 6) 3) FDK GPU VRAMSherl 4) 2) TB 2) 2 2) Noel 5) 2) 1 Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2 NDI Non-Destructive Inspection Business Unit, Analytical and Measuring Instruments Division, Simadzu corporation 1 c2009 Information

v θ n u z y x X 1 uv P n1 n Kxyz F D d n X θ n X xyz x D 1 FDK VRAM 512 3 1024 3 VRAM VRAM GPU GPU 2 GPU GPU 2 FDK 3 1 GPU 4 5 6 2. Feldkamp, Davis, Kress FDK 6) CT U V K P 1,...,P K N 3 F 1 FDK d n P 1,P 2,...,P K Shepp-Logan 7) Q 1,Q 2,...,Q K Q n Q n(u, v) 0 u U 1, 0 v V 1 P n P n (u, v) R W 1 (r, v) (1) W 1(r, v) (2) R 2 Q n(u, v) = W1(r, v)pn(r, v) (1) π 2 (1 4r 2 ) W 1(r, v) = r= R D D2 + r 2 + v 2 (2) F F (x, y, z) 0 x, y, z N 1 Q 1,Q 2,...,Q K (3) F (x, y, z) = 1 K W 2(x, y, n)q n(u(x, y, n),v(x, y, z, n)) (3) 2πK n=1 (3) Q n (u, v) Q n (u, v) W 2 (x, y, n) W 2 (x, y, n) (u(x, y, n),v(x, y, z, n)) (4)(6) ( ) d 2 n W 2 (x, y, n) = (4) D(x sin θn + y cos θn) u(x, y, n) = (5) Dz v(x, y, z, n) = (6) u(x, y, n) v(x, y, z, n) 2 c2009 Information

3. 1 GPU N 1024 F N 1024 F 4GB VRAM 4GB VRAM VRAM (3) F (x, y, z) (1) (3) F N 3 /BB >1 B F b 1 b B b xy z b N 3 /B VRAM F b B 3) (6) (7) (8) z N (8) v(x, y, z, n) =v (x, y, n)z (7) v D (x, y, n) = (8) z (8) x y VRAM x y x VRAM x B Coalesced 2) y 2 F 1 Q n 9 Q n Input: Projection P 1...P K, filtering size R, numberofsubvolumeb, and parameters D, d 1...d K,θ 1...θ K Output: Volume F Algorithm Reconstruction() 1: Divide volume F to subvolume F 1,F 2,...,F B ; 2: for b =1to B do 3: Initialize subvolume F b ; 4: for k =1to K do 5: if b == 1 do 6: Transfer projection P k to global memory; 7: Q k Filtering(P k,r); 8: Transfer filtered projection Q k to texture memory; 9: Transfer filtered projection Q k to main memory; 10: else 11: Transfer filtered projection Q k to texture memory; 12: end if 13: Bind filtered projection Q k as a texture; 14: F b Backprojection(Q k,d,d k,θ k,k,b); 15: end for 16: Transfer subvolume F b to main memory; 17: end for 2 CUDA 2) CPU GPU 4. 3 C GPU C GPU 1 2 (1)(2) 2 4 3 2 2 3 3(a) GPU 3 c2009 Information

K/C B F b K/C B F b K/C B F b 1 GPU O(BKUV/C) O(BKUV/C) O(KUV/C) O(KUV ) O(KN 3 /C) O(KN 3 /C) O(N 3 ) O(N 3 /C) CPU O(N 3 C) O(C) VRAM O(N 3 /B) O(N 3 /B) O(N 3 C) O(N 3 ) (a) K B/C F b B/C F b B/C F b (b) 3 GPU K/C B F b (3) F (x, y, z) GPU F b F b 3(b) GPU K B/C F b F (x, y, z) (3) F b B <C C B GPU B C 3(b) B = C 1 1 VRAM VRAM CPU CPUGPU 2 1 F b 1 GPU B K/C B 1 GPU B/C K B/C O(UV) O(BKUV/C) F b 1 1 GPU B 1 GPU B/C F b O(N 3 /B) O(N 3 )O(N 3 /C) 1 GPU 1 GPU K/C 1 GPU K 1 UV 4 c2009 Information

O(KUV/C)O(KUV ) 1 GPU 1 GPU K/C B 1 GPU K B/C 1 N 3 /B O(KN 3 /C) C F b 1 N 3 /B B O(N 3 C) B O(B) VRAM 1 VRAM O(N 3 /B) C GPU B O(CN 3 ) C GPU B/C O(N 3 ) 5. K = 1200,U = 1024,V = 1012,N = 1024 R = 512,B =8 CPU Xeon E54503.0GHz16GB PC GPU 4 GPU nvidia Tesla S1070-400 OS CentOS 5.3 185.18.08 CUDA CUDA 2.2 5.1 2 C =1, 2, 4 TB 64 8 C =4 2 C =1 C =2 1.7 C =4 2.6 C C =2 2 GPU C =1 C =2 C =4 0.52 0.52 0.26 0.13 14.88 9.73 9.78 6.32 9.24 4.69 9.24 9.24 63.76 31.94 32.01 16.60 1.59 1.10 1.63 2.27 1.27 1.41 0.67 0.35 3.76 0.00 0.00 91.26 53.15 53.59 34.91 C 1 C 1 CPUGPU PCI-Express PCI-Express GPU 2 B 4 CPUGPU CPUGPU 4 C =2 8% C C C C =1 10% C =4 25% C C 5 c2009 Information

100% 3 % % 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% C=1 C=2 C=2 C=4 C 4 % 5.2 4 CPUGPU 20 25% CPUGPU CPUGPU 2) CPUGPU cudaarray 2) CPUGPU 3 C CPUGPU 3 CPUGPU 2 C CPUGPU 3 C 6. VRAM GPU 2 GPU GPU C =1 2.81 15.84 C =2 2.25 18.40 2.79 23.13 C =4 2.98 33.26 1 GPU VRAM 4 GPU 1 GPU 2.6 A2024002 COE 1) GPGPU: General-Purpose Computation Using Graphics Hardware (2007). http: //www.gpgpu.org/. 2) nvidia Corporation: CUDA Programming Guide Version 1.1 (2007). 3) Okitsu, Y., Ino, F. and Hagihara, K.: Fast Cone Beam Reconstruction Using the CUDA-enabled GPU, Proc. 15th Int l Conf. High Performance Computing (HiPC 08), pp.108 119 (2008). 4) Scherl, H., Keck, B., Kowarschik, M. and Hornegger, J.: Fast GPU-Based CT Reconstruction using the Common Unified Device Architecture (CUDA), Proc. Nuclear Science Symp. and Medical Imaging Conf. (NSS/MIC 07), pp.4464 4466 (2007). 5) Noël, P.B., Walczak, A.M., Hoffmann, K.R., Xu, J., Corso, J.J. and Schafer, S.: Clinical Evaluation of GPU-Based Cone Beam Computed Tomography, Proc. High- Performance Medical Image Computing and Computer Aided Intervention (HP- MICCAI 08) (2008). 6) Feldkamp, L.A., Davis, L.C. and Kress, J.W.: Practical cone-beam algorithm, J. Optical Society of America, Vol.1, No.6, pp.612 619 (1984). 7) Shepp, L.A. and Logan, B.F.: The Fourier Reconstruction of a Head Section, IEEE Trans. Nuclear Science, Vol.21, No.3, pp.21 43 (1974). 6 c2009 Information