2ndD3.eps

Similar documents
GPU GPU CPU CPU CPU GPU GPU N N CPU ( ) 1 GPU CPU GPU 2D 3D CPU GPU GPU GPGPU GPGPU 2 nvidia GPU CUDA 3 GPU 3.1 GPU Core 1


main.dvi

07-二村幸孝・出口大輔.indd

スパコンに通じる並列プログラミングの基礎

GPU n Graphics Processing Unit CG CAD

スパコンに通じる並列プログラミングの基礎

23 Fig. 2: hwmodulev2 3. Reconfigurable HPC 3.1 hw/sw hw/sw hw/sw FPGA PC FPGA PC FPGA HPC FPGA FPGA hw/sw hw/sw hw- Module FPGA hwmodule hw/sw FPGA h

untitled

untitled

1重谷.PDF

TSUBAME2.0 における GPU の 活用方法 東京工業大学学術国際情報センター丸山直也第 10 回 GPU コンピューティング講習会 2011 年 9 月 28 日

GPGPU

EGunGPU

HP High Performance Computing(HPC)

スライド 1

develop

! 行行 CPUDSP PPESPECell/B.E. CPUGPU 行行 SIMD [SSE, AltiVec] 用 HPC CPUDSP PPESPE (Cell/B.E.) SPE CPUGPU GPU CPU DSP DSP PPE SPE SPE CPU DSP SPE 2

スパコンに通じる並列プログラミングの基礎

マルチコアPCクラスタ環境におけるBDD法のハイブリッド並列実装

supercomputer2010.ppt

CPU Levels in the memory hierarchy Level 1 Level 2... Increasing distance from the CPU in access time Level n Size of the memory at each level 1: 2.2

Microsoft PowerPoint - GPU_computing_2013_01.pptx

HPC (pay-as-you-go) HPC Web 2

1 GPU GPGPU GPU CPU 2 GPU 2007 NVIDIA GPGPU CUDA[3] GPGPU CUDA GPGPU CUDA GPGPU GPU GPU GPU Graphics Processing Unit LSI LSI CPU ( ) DRAM GPU LSI GPU

1 OpenCL OpenCL 1 OpenCL GPU ( ) 1 OpenCL Compute Units Elements OpenCL OpenCL SPMD (Single-Program, Multiple-Data) SPMD OpenCL work-item work-group N

( CUDA CUDA CUDA CUDA ( NVIDIA CUDA I

Vol.214-HPC-145 No /7/3 C #pragma acc directive-name [clause [[,] clause] ] new-line structured block Fortran!$acc directive-name [clause [[,] c

XACCの概要

PC Development of Distributed PC Grid System,,,, Junji Umemoto, Hiroyuki Ebara, Katsumi Onishi, Hiroaki Morikawa, and Bunryu U PC WAN PC PC WAN PC 1 P

Slides: TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments

卒業論文

09中西

FINAL PROGRAM 22th Annual Workshop SWoPP / / 2009 Sendai Summer United Workshops on Parallel, Distributed, and Cooperative Processing

GPU チュートリアル :OpenACC 篇 Himeno benchmark を例題として 高エネルギー加速器研究機構 (KEK) 松古栄夫 (Hideo Matsufuru) 1 December 2018 HPC-Phys 理化学研究所 共通コードプロジェクト

FINAL PROGRAM 25th Annual Workshop SWoPP / / 2012 Tottori Summer United Workshops on Parallel, Distributed, and Cooperative Processing 2012

untitled

Express5800/140Ma

AMD/ATI Radeon HD 5870 GPU DEGIMA LINPACK HD 5870 GPU DEGIMA LINPACK GFlops/Watt GFlops/Watt Abstract GPU Computing has lately attracted

MATLAB® における並列・分散コンピューティング ~ Parallel Computing Toolbox™ & MATLAB Distributed Computing Server™ ~

FIT2013( 第 12 回情報科学技術フォーラム ) I-032 Acceleration of Adaptive Bilateral Filter base on Spatial Decomposition and Symmetry of Weights 1. Taiki Makishi Ch

Shonan Institute of Technology MEMOIRS OF SHONAN INSTITUTE OF TECHNOLOGY Vol. 41, No. 1, 2007 Ships1 * ** ** ** Development of a Small-Mid Range Paral

HPEハイパフォーマンスコンピューティング ソリューション

10D16.dvi


Express5800/140Ma

DO 時間積分 START 反変速度の計算 contravariant_velocity 移流項の計算 advection_adams_bashforth_2nd DO implicit loop( 陰解法 ) 速度勾配, 温度勾配の計算 gradient_cell_center_surface 速

strtok-count.eps

211 年ハイパフォーマンスコンピューティングと計算科学シンポジウム Computing Symposium 211 HPCS /1/18 a a 1 a 2 a 3 a a GPU Graphics Processing Unit GPU CPU GPU GPGPU G

IPSJ SIG Technical Report Vol.2013-ARC-203 No /2/1 SMYLE OpenCL (NEDO) IT FPGA SMYLEref SMYLE OpenCL SMYLE OpenCL FPGA 1

高校生の就職への数学II

( )

HP Workstation 総合カタログ

Cell/B.E. BlockLib

A Study of Adaptive Array Implimentation for mobile comunication in cellular system GD133

01_OpenMP_osx.indd

IPSJ SIG Technical Report Vol.2014-ARC-213 No.24 Vol.2014-HPC-147 No /12/10 GPU 1,a) 1,b) 1,c) 1,d) GPU GPU Structure Of Array Array Of

GPUコンピューティング講習会パート1

untitled

2005 1

GPU.....

IPSJ SIG Technical Report Vol.2012-ARC-202 No.13 Vol.2012-HPC-137 No /12/13 Tightly Coupled Accelerators 1,a) 1,b) 1,c) 1,d) GPU HA-PACS

SWoPP BOF BOF-1 8/3 19:10 BoF SWoPP : BOF-2 8/5 17:00 19:00 HW/SW 15 x5 SimMips/MieruPC M-Core/SimMc FPGA S

Second-semi.PDF

[1] [2] [3] (RTT) 2. Android OS Android OS Google OS 69.7% [4] 1 Android Linux [5] Linux OS Android Runtime Dalvik Dalvik UI Application(Home,T

85 4

HPC pdf

GPUコンピューティング講習会パート1

i

IPSJ SIG Technical Report Vol.2014-CG-155 No /6/28 1,a) 1,2,3 1 3,4 CG An Interpolation Method of Different Flow Fields using Polar Inter

untitled

LP-S820

main.dvi

NUMAの構成

Express5800/120Mc

1 Table 1: Identification by color of voxel Voxel Mode of expression Nothing Other 1 Orange 2 Blue 3 Yellow 4 SSL Humanoid SSL-Vision 3 3 [, 21] 8 325

Logitec NAS シリーズ ソフトウェアマニュアル

FFTSS Library Version 3.0 User's Guide


1, 2, 2, 2, 2 Recovery Motion Learning for Single-Armed Mobile Robot in Drive System s Fault Tauku ITO 1, Hitoshi KONO 2, Yusuke TAMURA 2, Atsushi YAM

untitled

地中レーダによる地下計測

OptiPlex OptiPlex 4 OptiPlex vpro Energy STAR5.0 EPEAT GOLD 90 Energy Smart Energy Smart

Itanium2ベンチマーク

OpenGL GLSL References Kageyama (Kobe Univ.) Visualization / 58

VLSI工学

T2EX T-Engine Version 1.01 Copyright c 2013 Personal Media Corporation

Mott散乱によるParity対称性の破れを検証

1 u t = au (finite difference) u t = au Von Neumann

Express5800/140Hb (2002/01/22)

<4D F736F F F696E74202D2091E63489F15F436F6D C982E682E992B48D8291AC92B489B F090CD2888F38DFC E B8CDD8


Logitec NAS シリーズ ソフトウェアマニュアル

HP Workstation 総合カタログ

Express5800/120Lc

IPSJ SIG Technical Report Vol.2013-HPC-138 No /2/21 GPU CRS 1,a) 2,b) SpMV GPU CRS SpMV GPU NVIDIA Kepler CUDA5.0 Fermi GPU Kepler Kepler Tesla


,., ping - RTT,., [2],RTT TCP [3] [4] Android.Android,.,,. LAN ACK. [5].. 3., 1.,. 3 AI.,,Amazon, (NN),, 1..NN,, (RNN) RNN

1 M32R Single-Chip Multiprocessor [2] [3] [4] [5] Linux/M32R UP(Uni-processor) SMP(Symmetric Multi-processor) MMU CPU nommu Linux/M32R Linux/M32R 2. M

WebGL OpenGL GLSL Kageyama (Kobe Univ.) Visualization / 57

Transcription:

CUDA GPGPU 2012 UDX 12/5/24 p. 1

FDTD GPU FDTD GPU FDTD FDTD FDTD PGI Acceralator CUDA OpenMP Fermi GPU (Tesla C2075/C2070, GTX 580) GT200 GPU (Tesla C1060, GTX 285) PC GPGPU 2012 UDX 12/5/24 p. 2

FDTD CIP 1 PC / PC FPGA Cell/B.E. GPU MPI Verilog/HDL CUDA/OpenCL GPGPU 2012 UDX 12/5/24 p. 3

GPU NVIDIA CUDA OpenCL CUDA CPU/GPU GPU CPU/GPU FDTD (PGI Acceralator) CUDA OpenMP GPGPU 2012 UDX 12/5/24 p. 4

FDTD GPU FDTD GPU FDTD FDTD FDTD PGI Acceralator CUDA OpenMP Fermi GPU (Tesla C2075/C2070, GTX 580) GT200 GPU (Tesla C1060, GTX 285) PC GPGPU 2012 UDX 12/5/24 p. 5

FDTD FDTD FDTD (Finite-Difference Time-Domain) Maxwell 2 Maxwell E = B t H = J + D t 2 F(x, y, z, t) x = F n (i + 1 2,j,k) F n (i 1 2,j,k) Δx + O(Δx 2 ) for xyz 6 GPGPU 2012 UDX 12/5/24 p. 6

FDTD FDTD GPGPU 2012 UDX 12/5/24 p. 7

FDTD MPI/OpenMP GPU CUDA/OpenCL GPU GPU PCI Express GPGPU 2012 UDX 12/5/24 p. 8

GPU Host (CPU) CPU Over 10 GB/s Host memory PCI Express 2.0 16 GB/s Control SP SP SP SP SP SP SP SP Registers SM/cache SP SP SP SP SP SP SP SP Registers SM/cache Device (GPU) SP SP SP SP SP SP SP SP Registers SM/cache Device memory MP GT200:30 MPs, 8 SPs Fermi: 16 MPs, 32 SPs Over 100 GB/s SP SP SP SP SP SP SP SP Registers SM/cache 5 GB/s Infiniband QDR GPGPU 2012 UDX 12/5/24 p. 9

GPU C2075 GTX 580 C1060 GTX 285 Number of cores 448 512 240 240 GFLOPS (single) 1030 1581 622 720 Memory (MB) 6144 3072 4096 2048 Bandwidth (GB/s) 144 192 102 159 SM/Caches (KB) 64 L1+SM, 768 L2 SM 16 Fermi 512 GT200 240 1 TFLOPS Core i7 100 GFLOPS 100 GB/s GPGPU 2012 UDX 12/5/24 p. 10

FDTD GPU GPU FDTD CUDA 1. CPU GPU 2. GPU CPU 3. CPU FDTD GPU GPU GPU C2075/C2070 2 GPGPU 2012 UDX 12/5/24 p. 11

GPU CUDA/OpenCL CUDA/OpenCL C/C++ Fortran PGI CUDA Fortran OpenMP C/Fortran CUDA NVIDIA OpenACC PGI Acceralator OpenACC GPU CPU/GPU CUDA GPGPU 2012 UDX 12/5/24 p. 12

OpenMP FDTD 1: for (t = 0.0; t < Te; t += dt){ 2: #pragma omp parallel{ 3: // Ex 4: #pragma omp for private(i, j, k) 5: for (i = 0; i < Ni - 1; i++){ 6: for (j = 1; j < Nj - 1; j++){ 7: for (k = 1; k < Nk - 1; k++) { 8: Ex[i][j][k] = c1 * Ex[i][j][k] 9: + c2 * (Hz[i][j][k] - Hz[i][j - 1] 10: - Hy[i][j][k] + Hy[i][j][k - GPGPU 2012 UDX 12/5/24 p. 13

PGI Acceralator FDTD 1: #pragma acc data region copy(ex[0:ni][0:nj][0:nk]), 2: copyin(ey[0:ni][0:nj][0:nk], Ez[0:Ni][0:Nj] 3: Hx[0:Ni][0:Nj][0:Nk], Hy[0:Ni][0:Nj] 4: ep[0:ni][0:nj][0:nk], sig[0:ni][0:nj 5: { 6: for (t = 0.0; t < Te; t += dt){ 7: #pragma acc region 8: { 9: // Ex 10: #pragma acc for parallel 11: for (i = 0; i < Ni - 1; i++){ 12: #pragma acc for parallel, vector(256) 13: for (j = 1; j < Nj - 1; j++){ 14: #pragma acc for vector(512) 15: for (k = 1; k < Nk - 1; k++){ 16: Ex[i][j][k] = c1 * Ex[i][j][k] 17: + c2 * (Hz[i][j][k] - Hz[i][j - 1] 18: - Hy[i][j][k] + Hy[i][j][k - GPGPU 2012 UDX 12/5/24 p. 14

FDTD GPU FDTD GPU FDTD FDTD FDTD PGI Acceralator CUDA OpenMP Fermi GPU (Tesla C2075/C2070, GTX 580) GT200 GPU (Tesla C1060, GTX 285) PC GPGPU 2012 UDX 12/5/24 p. 15

1 Fermi GPU GPU Tesla C2075/C2070, Gefroce GTX 580 PGI Acceralator C/C++ Workstation 12.2 CUDA 4.0 CPU Intel Core i7 980X (3.33 GHz) gcc 4.4.3 -O3 OpenMP OS: 64 bit Linux (Ubuntu 10.04 LTS server) GPGPU 2012 UDX 12/5/24 p. 16

256 3 J x 1.0 m E x CPU 1.0 0.5 Exact CPU GPU Electric field Ex (V/m) 0.0-0.5-1.0-1.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 Time (ns) GPGPU 2012 UDX 12/5/24 p. 17

CPU 8 CPU GPU 256 256 256 1000 5 GPU precision CPU t C1 (s) CPU t C8 (s) GPU t GD (s) t C8 /t GD GTX 580 float 1410.55 330.52 32.67 10.12 double 1124.13 349.03 39.79 8.77 C2075 float 1410.55 330.52 48.75 6.78 double 1124.13 349.03 65.03 5.37 Core i7 980X:10 GTX580:5 C2075:20 CPU 8 GTX 580: 10 9 C2075: 7 5 CPU 1 GTX 580: 43 28 C2075: 29 17 GPGPU 2012 UDX 12/5/24 p. 18

CUDA CUDA 256 256 256 1000 5 GPU precision GPU t GD (s) GPU t GC (s) t GC /t GD GTX 580 float 32.67 10.10 0.309 double 39.79 21.25 0.534 C2075 float 48.75 19.36 0.397 double 65.03 39.28 0.604 CUDA GTX 580: 31% 53% C2075: 40% 60% GPGPU 2012 UDX 12/5/24 p. 19

PC 320 480 320 CUDA 5000 GPU CPU t C1 (s) GPU t GD (s) GPU t GC (s) t C G /td G GTX 580 18140.44 484.02 222.87 0.460 C2070 18140.44 724.21 406.62 0.561 GPGPU 2012 UDX 12/5/24 p. 20

(a) 3 ns later (b) 6 ns later (c) 9 ns later (d) (c) GPGPU 2012 UDX 12/5/24 p. 21

2 FDTD FDTD 1/10 GPU 5/29-31 GPGPU 2012 UDX 12/5/24 p. 22

4 FDTD FDTD 2 F(x, y, z, t) x F(x, y, z, t) t = F n (i + 1 2,j,k) F n (i 1 2,j,k) Δx = F n+ 1 2 (i, j, k) F n+ 1 2 (i, j, k) Δt + O(Δx 2 ) + O(Δt 2 ) FDTD(2,4) 4 2 F(x, y, z, t) = 9 F n (i + 1 2,j,k) F n (i 1 2,j,k) x 8 Δx 1 F n (i + 3 2,j,k) F n (i 3 2,j,k) + O(Δx 4 ) 24 Δx GPGPU 2012 UDX 12/5/24 p. 23

CPU 8 CPU GPU 256 256 256 1000 5 GPU precision CPU t H C8 (s) GPU th GD (s)gpu tf GD (s) th C8 /th GD GTX 580 float 391.30 34.67 32.67 11.98 double 431.84 46.82 39.79 9.22 C2075 float 391.30 52.39 48.75 7.47 double 431.84 74.73 65.03 5.78 CPU 8 GTX 580: 12 9 C2075: 7 6 FDTD GTX 580: 1.1 1.2 C2075: 1.1 1.1 GPGPU 2012 UDX 12/5/24 p. 24

CUDA CUDA 256 256 256 1000 5 GPU precision GPU t H GD (s)gpu th GC (s) th GC /th GD t F GC /tf GD GTX 580 float 34.67 18.83 0.543 0.309 double 46.82 40.32 0.861 0.534 C2075 float 52.39 24.09 0.460 0.397 double 74.73 70.90 0.945 0.604 CUDA GTX 580: 54 % 86 % C2075: 46 % 95 % GPGPU 2012 UDX 12/5/24 p. 25

3 GT 200 GPU 2011 3.14 GT 200 Geforce GTX 285 Tesla C1060 PGI Accelerator Workstation C/C++ 10.9 CUDA 3.1 CPU Intel Core i7 980X (3.33 GHz) gcc 4.4.3 -O3 OpenMP GPGPU 2012 UDX 12/5/24 p. 26

CPU 8 CPU GPU 256 256 256 1000 5 GPU precision CPU t C1 (s) CPU t C8 (s) GPU t GD (s) t C8 /t GD GTX 285 float 1410.55 330.52 115.83 2.85 C1060 float 1410.55 330.52 122.63 2.70 C2070 float 1410.55 330.52 65.01 5.08 CPU 8 GTX 285: 3 C1060: 3 C2070: 5 GPGPU 2012 UDX 12/5/24 p. 27

CUDA CUDA 256 256 256 1000 5 GPU precision GPU t GD (s) GPU t GC (s) t GC /t GD GTX 285 float 115.83 22.49 0.194 C1060 float 122.63 24.56 0.200 C2070 float 65.01 20.81 0.320 CUDA GTX 285: 20 % C1060: 20 % C2070: 32 % GPGPU 2012 UDX 12/5/24 p. 28

4 PC 2005 PC super computer SX-7 our PC cluster at Tohoku Univ. Pentium 4 3.0 GHz 16 (NEC) (handmade) # of CPUs 240 16 memory 1920 Gbyte 8 Gbyte job class max 32 CPU, 256 Gbyte 16 CPU, 8 Gbyte accounting 0.4 Y/sec 0 parallelize auto (sxcc Pauto ) Message Passing (MPI) GPGPU 2012 UDX 12/5/24 p. 29

PC PC FDTD 160 160 160 1000 5 computation time [s] architecture FDTD FDTD(2,4) NEC SX-7 5.24 8.02 Pentium 4 2.8GHz 16 642.80 2816.94 C2075 (PGI 12.2) 21.16 23.59 C2075 (CUDA 4.0) 9.34 14.33 GPGPU 2012 UDX 12/5/24 p. 30

FDTD GPU Fermi GPU CPU 8 GTX 580 10 C2075 6 CUDA GTX 580 30 50 % C2075 40 60 % CUDA CUDA 50 % FDTD FDTD 1.2 CUDA 90 % GT 200 GPU CPU 8 3 CUDA 20 % NEC SX-7 C2075 1/4 GPGPU 2012 UDX 12/5/24 p. 31

X Maxwell 989-3128 16 1 Jun SONODA E-mail: sonoda@sendai-nct.ac.jp GPGPU 2012 UDX 12/5/24 p. 32

1. FDTD H21 H22,23 Cell/B.E. FDTD Cell Challenge 2009 1 IPv6 PC H21 23 2. GPU H23 NTT H23 JST A-STEP H20 H22 H19 H21 GPGPU 2012 UDX 12/5/24 p. 33

1. GPGPU 2012 UDX 12/5/24 p. 34

FDTD (Finite-Difference Time-Domain) CIP (Constrained Interpolation Profile) FDTD CIP Maxwell FDTD CIP GPGPU 2012 UDX 12/5/24 p. 35

FDTD FDTD 2 [ ( )] 1 ωδt 2 [ vδt sin 1 = 2 Δζ sin ζ=x,y,z ( )] 2 k ζ Δζ GPGPU 2012 UDX 12/5/24 p. 36 2

Maximum dispersion error c 0 -c n /c 0 (%) 1000 100 10 1 0.1 0.01 Δ=λ/10 Δ=λ/20 Δ=λ/40 Δ=λ/60 Δ=λ/80 Δ=λ/100 1 10 100 1000 Propagation distance (λ) GPGPU 2012 UDX 12/5/24 p. 37

Δ=λ/m R = nλ 1.7 n e R 100 log(m) 1 (%) model Δ R e Rmax (%) e FDTD (%) by our eq. by FDTD 2-D λ/10 30λ 51 51 λ/10 60λ 102 102 λ/20 30λ 13 13 λ/20 120λ 51 50 3-D λ/10 15λ 26 25 λ/10 30λ 51 51 GPGPU 2012 UDX 12/5/24 p. 38

2 FDTD N =2,M =2 f(x) x = a 1 + 1 a 1 3 f(x + 1 2 Δ) f(x 1 2 Δ) Δ f(x + 3 2 Δ) f(x 3 2 Δ) Δ + O(Δ 2 ) 2 FDTD a 1 k k Θ= β k = 2 Δ β (kδ kδ) 2 d(kδ) [a 1 ( sin kδ 2 1 3 sin 3kΔ 2 π β π ) + 1 3 ] 3kΔ sin 2 GPGPU 2012 UDX 12/5/24 p. 39

2 FDTD β a 1 a 1 1. β a 1 2. a 1 k k a 1 Θ/ a 1 =0 ( ) ( 8 27 sin β 2 sin 3β 12β 9cos β 2 2 cos 3β 2 a 1 = 60β 90 sin β +18sin2β 2sin3β 6β 18 sin β +9sin2β 2sin3β + 60β 90 sin β +18sin2β 2sin3β ) GPGPU 2012 UDX 12/5/24 p. 40

10-1 Dispersion error e θφ 10-2 10-3 10-4 10-5 0.01 0.1 1 Courant number FDTD(2,2) FDTD(2,4) Tam 1993 Wang 1996 Proposed 2 GPGPU 2012 UDX 12/5/24 p. 41

GPGPU 2012 UDX 12/5/24 p. 42

0.002 measurement 0.002 measurement FDTD Opt.FDTD 0.001 0.001 Electric field Ex[V/m] 0.000-0.001 Electric field Ex [V/m] 0.000-0.001-0.002 13 14 15 16 17 18 19 time [ns] FDTD -0.002 13 14 15 16 17 18 19 time [ns] FDTD GPGPU 2012 UDX 12/5/24 p. 43

FDTD FDTD PC FPGA (Field-Programmable Gate Array) Cell Broadband Engine (Cell/B.E.) GPU (Graphics Processing Unit) FDTD CIP GPGPU 2012 UDX 12/5/24 p. 44

Cell/B.E. FDTD Cell/B.E. SONY IBM CPU PS3 PS3 Cell/B.E. 1 8 GPGPU 2012 UDX 12/5/24 p. 45

Main memory Main memory : SPE FDTD t t n +2 n +3/2 n +1 n +1/2 n SPE 2 SPE 1 n +2 n +3/2 n +1 n +1/2 n i 2 i 1 i 3/2 i 1/2 i i +1 i +2 i +1/2 i +3/2 z i 2 i 1 i 3/2 i 1/2 i i +1 i +2 i +1/2 i +3/2 i +5/2 z GPGPU 2012 UDX 12/5/24 p. 46

PS3 FDTD Speedup Ratio 6 5 4 3 2 1 TSP Large TSP Small Parallel Large Parallel Small Ideal 1 2 3 4 5 6 Number of SPE(s) Xeon 2.8GHz MacPro 10 GPGPU 2012 UDX 12/5/24 p. 47

PC PC PC SCore Clustermatic Los Alamos National Lab. Windows HPC Server 2008 (Microsoft ) OS Live Linux PC KNOPPIX PC DHCP GPGPU 2012 UDX 12/5/24 p. 48

IPv6 PC USB/CD/DVD 1 PC PC PC Live Linux USB/CD/DVD Linux IPv6 PC Live Linux OS PC PC GPGPU 2012 UDX 12/5/24 p. 49

HTTP-FUSE-KNOPPIX PC Live Linux USB/CD/DVD PC PC /home block file kernel magic packet NFS HTTP TFTP WOL PC 01 PC 02 PC 03 PC 04 PC n server client client client client Live Linux HTTP-FUSE-KNOPPIX USB or CD for server system for client boot loader, kernel, blockfile PC PC Live Linux GPGPU 2012 UDX 12/5/24 p. 50

PC 2.2 2 1.8 our system NFS_servr * 3 144.9 [s] ratio 1.6 1.4 1.2 1 0.8 71.2 [s] 108.5 [s] 89.5 [s] 0 10 20 30 40 50 60 70 80 90 100 # of PCs 1/2 GPGPU 2012 UDX 12/5/24 p. 51

IPv6 PC DHCP IP DHCP IPv6 MAC IP GPGPU 2012 UDX 12/5/24 p. 52

PC 120 100 boot time [s] 80 60 40 20 0 4 8 12 16 20 number of PCs IPv4 NFS IPv4 SSHFS IPv6 SSHFS IPv4 NFS GPGPU 2012 UDX 12/5/24 p. 53

NPB EP-D 1600 1400 IPv4 NFS IPv6 SSHFS IPv4 SSHFS EP Class D[Mop/s] 1200 1000 800 600 400 200 0 0 10 20 30 40 50 60 70 80 number of cores IPv4 NFS GPGPU 2012 UDX 12/5/24 p. 54

2. GPGPU 2012 UDX 12/5/24 p. 55

GPGPU 2012 UDX 12/5/24 p. 56

GPR (Ground Penetrating Radar) FDTD 1990 FDTD FPGA Cell/B.E. FDTD GPGPU 2012 UDX 12/5/24 p. 57

2D/3D air 0.1 m 1.0 m J y z y O x ground ε r =4.0 σ =0.001 S/m 0.1 m cylinder ε r =1.0 σ =0.0S/m GPGPU 2012 UDX 12/5/24 p. 58

2D 3D problem size 1024 x 1024 256 x 256 x 256 source line current point current pulse Gaussian ( 3dB width:0.5 ns) 2.5 x +2.5 1.0 x/y +1.0 scan range (Δx =0.05 m) (Δx =Δy =0.1m) # of scannings 100 400 ground ε r =4.0, σ =0.001 S/m cylinder ε r =1.0, σ =0.0S/m increments Δ=0.01 m, Δs =0.01 10 6 s # of time steps 3000 ABC 1st. Mur compiler CUDA 4.0 (gcc 4.4.5 -O3) GPGPU 2012 UDX 12/5/24 p. 59

GPU Geforce GTX 580 10 PC 1 2 GPU 10 GPGPU 2012 UDX 12/5/24 p. 60

3 GPGPU 2012 UDX 12/5/24 p. 61

3 CPU/GPU CPU 980X x10 65 GTX 580 x10 30 GPGPU 2012 UDX 12/5/24 p. 62

FDTD X GPGPU 2012 UDX 12/5/24 p. 63

d 0 = L ε 1,μ 1 ε 2,μ 2 ε 1,μ 1 0 0th stage L x d 1 0 1st stage L x d 2 0 2nd stage L x GPGPU 2012 UDX 12/5/24 p. 64

Transmission coefficient (db) 0-5 -10-15 -20-25 -30-35 0 0.1 0.2 0.3 0.4 0.5 d / λ 1st 2nd 3rd ε 2 /ε 1 =4.0 Transmission coefficient (db) 0-5 -10-15 -20-25 -30-35 -40 0 0.1 0.2 0.3 0.4 0.5 d / λ 3 layers 7 layers 15 layers GPGPU 2012 UDX 12/5/24 p. 65

0 peak 15-5 Q Minimum transmission (db) -10-15 -20-25 -30 10 5 Q value -35 1 2 3 stage number 0 GPGPU 2012 UDX 12/5/24 p. 66

20 Peak Q 1000 Maximum resonance (db) 15 10 5 100 Q value 0 1 2 3 stage number 10 GPGPU 2012 UDX 12/5/24 p. 67

SiO2-TiO2 2 400 800 nm GPGPU 2012 UDX 12/5/24 p. 68

1.2 1 Transmission 0.8 0.6 0.4 0.2 measured FDTD 0 0.14 0.16 0.18 0.2 0.22 0.24 d 2 /λ 2 GPGPU 2012 UDX 12/5/24 p. 69

1 6 GHz S21 FDTD GPGPU 2012 UDX 12/5/24 p. 70

ε r =2.25 GPGPU 2012 UDX 12/5/24 p. 71

FDTD 2 0.0 Transmission coefficient (db) -1.0-2.0-3.0-4.0-5.0-6.0-7.0-8.0 measured FDTD 1 2 3 4 5 6 Frequency (GHz) GPGPU 2012 UDX 12/5/24 p. 72

LLS FDTD 3 MW-FDTD PC PC GPGPU 2012 UDX 12/5/24 p. 73

MW-FDTD Moving Window FDTD MW-FDTD FDTD MW-FDTD MW-FDTD LLS GPGPU 2012 UDX 12/5/24 p. 74

3 MW-FDTD PC MW-FDTD 4 MW-FDTD F=1/16 FDTD 47 1/64 FDTD 20 1/16 GPGPU 2012 UDX 12/5/24 p. 75

地形モデルによる雷放電電磁界解析 ツールで始める GPGPU 2012 春 秋葉原 UDX 12/5/24 p. 76

GPU(Graphics Processing Unit) GPGPU 2012 UDX 12/5/24 p. 77

SfM FDTD FDTD SfM (Structure from Motion) 1 3 1 3 FDTD GPGPU 2012 UDX 12/5/24 p. 78

AR による電波環境の現実的可視化 FDTD 法 計算結果は電磁界 6 成分の時間応答 計算結 果を分かりやすく表示 AVS 等の 3 次元可視化ソフト 高コスト 現実感なし AR (Augmented Reality) 技術 実際の映像上に人工物を マッピングする技術 AR によるポインティングベクトル分布の可視化 ツールで始める GPGPU 2012 春 秋葉原 UDX 12/5/24 p. 79