2ndD3.eps

Size: px
Start display at page:

Download "2ndD3.eps"

Transcription

1 CUDA GPGPU 2012 UDX 12/5/24 p. 1

2 FDTD GPU FDTD GPU FDTD FDTD FDTD PGI Acceralator CUDA OpenMP Fermi GPU (Tesla C2075/C2070, GTX 580) GT200 GPU (Tesla C1060, GTX 285) PC GPGPU 2012 UDX 12/5/24 p. 2

3 FDTD CIP 1 PC / PC FPGA Cell/B.E. GPU MPI Verilog/HDL CUDA/OpenCL GPGPU 2012 UDX 12/5/24 p. 3

4 GPU NVIDIA CUDA OpenCL CUDA CPU/GPU GPU CPU/GPU FDTD (PGI Acceralator) CUDA OpenMP GPGPU 2012 UDX 12/5/24 p. 4

5 FDTD GPU FDTD GPU FDTD FDTD FDTD PGI Acceralator CUDA OpenMP Fermi GPU (Tesla C2075/C2070, GTX 580) GT200 GPU (Tesla C1060, GTX 285) PC GPGPU 2012 UDX 12/5/24 p. 5

6 FDTD FDTD FDTD (Finite-Difference Time-Domain) Maxwell 2 Maxwell E = B t H = J + D t 2 F(x, y, z, t) x = F n (i + 1 2,j,k) F n (i 1 2,j,k) Δx + O(Δx 2 ) for xyz 6 GPGPU 2012 UDX 12/5/24 p. 6

7 FDTD FDTD GPGPU 2012 UDX 12/5/24 p. 7

8 FDTD MPI/OpenMP GPU CUDA/OpenCL GPU GPU PCI Express GPGPU 2012 UDX 12/5/24 p. 8

9 GPU Host (CPU) CPU Over 10 GB/s Host memory PCI Express GB/s Control SP SP SP SP SP SP SP SP Registers SM/cache SP SP SP SP SP SP SP SP Registers SM/cache Device (GPU) SP SP SP SP SP SP SP SP Registers SM/cache Device memory MP GT200:30 MPs, 8 SPs Fermi: 16 MPs, 32 SPs Over 100 GB/s SP SP SP SP SP SP SP SP Registers SM/cache 5 GB/s Infiniband QDR GPGPU 2012 UDX 12/5/24 p. 9

10 GPU C2075 GTX 580 C1060 GTX 285 Number of cores GFLOPS (single) Memory (MB) Bandwidth (GB/s) SM/Caches (KB) 64 L1+SM, 768 L2 SM 16 Fermi 512 GT TFLOPS Core i7 100 GFLOPS 100 GB/s GPGPU 2012 UDX 12/5/24 p. 10

11 FDTD GPU GPU FDTD CUDA 1. CPU GPU 2. GPU CPU 3. CPU FDTD GPU GPU GPU C2075/C GPGPU 2012 UDX 12/5/24 p. 11

12 GPU CUDA/OpenCL CUDA/OpenCL C/C++ Fortran PGI CUDA Fortran OpenMP C/Fortran CUDA NVIDIA OpenACC PGI Acceralator OpenACC GPU CPU/GPU CUDA GPGPU 2012 UDX 12/5/24 p. 12

13 OpenMP FDTD 1: for (t = 0.0; t < Te; t += dt){ 2: #pragma omp parallel{ 3: // Ex 4: #pragma omp for private(i, j, k) 5: for (i = 0; i < Ni - 1; i++){ 6: for (j = 1; j < Nj - 1; j++){ 7: for (k = 1; k < Nk - 1; k++) { 8: Ex[i][j][k] = c1 * Ex[i][j][k] 9: + c2 * (Hz[i][j][k] - Hz[i][j - 1] 10: - Hy[i][j][k] + Hy[i][j][k - GPGPU 2012 UDX 12/5/24 p. 13

14 PGI Acceralator FDTD 1: #pragma acc data region copy(ex[0:ni][0:nj][0:nk]), 2: copyin(ey[0:ni][0:nj][0:nk], Ez[0:Ni][0:Nj] 3: Hx[0:Ni][0:Nj][0:Nk], Hy[0:Ni][0:Nj] 4: ep[0:ni][0:nj][0:nk], sig[0:ni][0:nj 5: { 6: for (t = 0.0; t < Te; t += dt){ 7: #pragma acc region 8: { 9: // Ex 10: #pragma acc for parallel 11: for (i = 0; i < Ni - 1; i++){ 12: #pragma acc for parallel, vector(256) 13: for (j = 1; j < Nj - 1; j++){ 14: #pragma acc for vector(512) 15: for (k = 1; k < Nk - 1; k++){ 16: Ex[i][j][k] = c1 * Ex[i][j][k] 17: + c2 * (Hz[i][j][k] - Hz[i][j - 1] 18: - Hy[i][j][k] + Hy[i][j][k - GPGPU 2012 UDX 12/5/24 p. 14

15 FDTD GPU FDTD GPU FDTD FDTD FDTD PGI Acceralator CUDA OpenMP Fermi GPU (Tesla C2075/C2070, GTX 580) GT200 GPU (Tesla C1060, GTX 285) PC GPGPU 2012 UDX 12/5/24 p. 15

16 1 Fermi GPU GPU Tesla C2075/C2070, Gefroce GTX 580 PGI Acceralator C/C++ Workstation 12.2 CUDA 4.0 CPU Intel Core i7 980X (3.33 GHz) gcc O3 OpenMP OS: 64 bit Linux (Ubuntu LTS server) GPGPU 2012 UDX 12/5/24 p. 16

17 256 3 J x 1.0 m E x CPU Exact CPU GPU Electric field Ex (V/m) Time (ns) GPGPU 2012 UDX 12/5/24 p. 17

18 CPU 8 CPU GPU GPU precision CPU t C1 (s) CPU t C8 (s) GPU t GD (s) t C8 /t GD GTX 580 float double C2075 float double Core i7 980X:10 GTX580:5 C2075:20 CPU 8 GTX 580: 10 9 C2075: 7 5 CPU 1 GTX 580: C2075: GPGPU 2012 UDX 12/5/24 p. 18

19 CUDA CUDA GPU precision GPU t GD (s) GPU t GC (s) t GC /t GD GTX 580 float double C2075 float double CUDA GTX 580: 31% 53% C2075: 40% 60% GPGPU 2012 UDX 12/5/24 p. 19

20 PC CUDA 5000 GPU CPU t C1 (s) GPU t GD (s) GPU t GC (s) t C G /td G GTX C GPGPU 2012 UDX 12/5/24 p. 20

21 (a) 3 ns later (b) 6 ns later (c) 9 ns later (d) (c) GPGPU 2012 UDX 12/5/24 p. 21

22 2 FDTD FDTD 1/10 GPU 5/29-31 GPGPU 2012 UDX 12/5/24 p. 22

23 4 FDTD FDTD 2 F(x, y, z, t) x F(x, y, z, t) t = F n (i + 1 2,j,k) F n (i 1 2,j,k) Δx = F n+ 1 2 (i, j, k) F n+ 1 2 (i, j, k) Δt + O(Δx 2 ) + O(Δt 2 ) FDTD(2,4) 4 2 F(x, y, z, t) = 9 F n (i + 1 2,j,k) F n (i 1 2,j,k) x 8 Δx 1 F n (i + 3 2,j,k) F n (i 3 2,j,k) + O(Δx 4 ) 24 Δx GPGPU 2012 UDX 12/5/24 p. 23

24 CPU 8 CPU GPU GPU precision CPU t H C8 (s) GPU th GD (s)gpu tf GD (s) th C8 /th GD GTX 580 float double C2075 float double CPU 8 GTX 580: 12 9 C2075: 7 6 FDTD GTX 580: C2075: GPGPU 2012 UDX 12/5/24 p. 24

25 CUDA CUDA GPU precision GPU t H GD (s)gpu th GC (s) th GC /th GD t F GC /tf GD GTX 580 float double C2075 float double CUDA GTX 580: 54 % 86 % C2075: 46 % 95 % GPGPU 2012 UDX 12/5/24 p. 25

26 3 GT 200 GPU GT 200 Geforce GTX 285 Tesla C1060 PGI Accelerator Workstation C/C CUDA 3.1 CPU Intel Core i7 980X (3.33 GHz) gcc O3 OpenMP GPGPU 2012 UDX 12/5/24 p. 26

27 CPU 8 CPU GPU GPU precision CPU t C1 (s) CPU t C8 (s) GPU t GD (s) t C8 /t GD GTX 285 float C1060 float C2070 float CPU 8 GTX 285: 3 C1060: 3 C2070: 5 GPGPU 2012 UDX 12/5/24 p. 27

28 CUDA CUDA GPU precision GPU t GD (s) GPU t GC (s) t GC /t GD GTX 285 float C1060 float C2070 float CUDA GTX 285: 20 % C1060: 20 % C2070: 32 % GPGPU 2012 UDX 12/5/24 p. 28

29 4 PC 2005 PC super computer SX-7 our PC cluster at Tohoku Univ. Pentium GHz 16 (NEC) (handmade) # of CPUs memory 1920 Gbyte 8 Gbyte job class max 32 CPU, 256 Gbyte 16 CPU, 8 Gbyte accounting 0.4 Y/sec 0 parallelize auto (sxcc Pauto ) Message Passing (MPI) GPGPU 2012 UDX 12/5/24 p. 29

30 PC PC FDTD computation time [s] architecture FDTD FDTD(2,4) NEC SX Pentium 4 2.8GHz C2075 (PGI 12.2) C2075 (CUDA 4.0) GPGPU 2012 UDX 12/5/24 p. 30

31 FDTD GPU Fermi GPU CPU 8 GTX C CUDA GTX % C % CUDA CUDA 50 % FDTD FDTD 1.2 CUDA 90 % GT 200 GPU CPU 8 3 CUDA 20 % NEC SX-7 C2075 1/4 GPGPU 2012 UDX 12/5/24 p. 31

32 X Maxwell Jun SONODA sonoda@sendai-nct.ac.jp GPGPU 2012 UDX 12/5/24 p. 32

33 1. FDTD H21 H22,23 Cell/B.E. FDTD Cell Challenge IPv6 PC H GPU H23 NTT H23 JST A-STEP H20 H22 H19 H21 GPGPU 2012 UDX 12/5/24 p. 33

34 1. GPGPU 2012 UDX 12/5/24 p. 34

35 FDTD (Finite-Difference Time-Domain) CIP (Constrained Interpolation Profile) FDTD CIP Maxwell FDTD CIP GPGPU 2012 UDX 12/5/24 p. 35

36 FDTD FDTD 2 [ ( )] 1 ωδt 2 [ vδt sin 1 = 2 Δζ sin ζ=x,y,z ( )] 2 k ζ Δζ GPGPU 2012 UDX 12/5/24 p. 36 2

37 Maximum dispersion error c 0 -c n /c 0 (%) Δ=λ/10 Δ=λ/20 Δ=λ/40 Δ=λ/60 Δ=λ/80 Δ=λ/ Propagation distance (λ) GPGPU 2012 UDX 12/5/24 p. 37

38 Δ=λ/m R = nλ 1.7 n e R 100 log(m) 1 (%) model Δ R e Rmax (%) e FDTD (%) by our eq. by FDTD 2-D λ/10 30λ λ/10 60λ λ/20 30λ λ/20 120λ D λ/10 15λ λ/10 30λ GPGPU 2012 UDX 12/5/24 p. 38

39 2 FDTD N =2,M =2 f(x) x = a a 1 3 f(x Δ) f(x 1 2 Δ) Δ f(x Δ) f(x 3 2 Δ) Δ + O(Δ 2 ) 2 FDTD a 1 k k Θ= β k = 2 Δ β (kδ kδ) 2 d(kδ) [a 1 ( sin kδ sin 3kΔ 2 π β π ) ] 3kΔ sin 2 GPGPU 2012 UDX 12/5/24 p. 39

40 2 FDTD β a 1 a 1 1. β a 1 2. a 1 k k a 1 Θ/ a 1 =0 ( ) ( 8 27 sin β 2 sin 3β 12β 9cos β 2 2 cos 3β 2 a 1 = 60β 90 sin β +18sin2β 2sin3β 6β 18 sin β +9sin2β 2sin3β + 60β 90 sin β +18sin2β 2sin3β ) GPGPU 2012 UDX 12/5/24 p. 40

41 10-1 Dispersion error e θφ Courant number FDTD(2,2) FDTD(2,4) Tam 1993 Wang 1996 Proposed 2 GPGPU 2012 UDX 12/5/24 p. 41

42 GPGPU 2012 UDX 12/5/24 p. 42

43 0.002 measurement measurement FDTD Opt.FDTD Electric field Ex[V/m] Electric field Ex [V/m] time [ns] FDTD time [ns] FDTD GPGPU 2012 UDX 12/5/24 p. 43

44 FDTD FDTD PC FPGA (Field-Programmable Gate Array) Cell Broadband Engine (Cell/B.E.) GPU (Graphics Processing Unit) FDTD CIP GPGPU 2012 UDX 12/5/24 p. 44

45 Cell/B.E. FDTD Cell/B.E. SONY IBM CPU PS3 PS3 Cell/B.E. 1 8 GPGPU 2012 UDX 12/5/24 p. 45

46 Main memory Main memory : SPE FDTD t t n +2 n +3/2 n +1 n +1/2 n SPE 2 SPE 1 n +2 n +3/2 n +1 n +1/2 n i 2 i 1 i 3/2 i 1/2 i i +1 i +2 i +1/2 i +3/2 z i 2 i 1 i 3/2 i 1/2 i i +1 i +2 i +1/2 i +3/2 i +5/2 z GPGPU 2012 UDX 12/5/24 p. 46

47 PS3 FDTD Speedup Ratio TSP Large TSP Small Parallel Large Parallel Small Ideal Number of SPE(s) Xeon 2.8GHz MacPro 10 GPGPU 2012 UDX 12/5/24 p. 47

48 PC PC PC SCore Clustermatic Los Alamos National Lab. Windows HPC Server 2008 (Microsoft ) OS Live Linux PC KNOPPIX PC DHCP GPGPU 2012 UDX 12/5/24 p. 48

49 IPv6 PC USB/CD/DVD 1 PC PC PC Live Linux USB/CD/DVD Linux IPv6 PC Live Linux OS PC PC GPGPU 2012 UDX 12/5/24 p. 49

50 HTTP-FUSE-KNOPPIX PC Live Linux USB/CD/DVD PC PC /home block file kernel magic packet NFS HTTP TFTP WOL PC 01 PC 02 PC 03 PC 04 PC n server client client client client Live Linux HTTP-FUSE-KNOPPIX USB or CD for server system for client boot loader, kernel, blockfile PC PC Live Linux GPGPU 2012 UDX 12/5/24 p. 50

51 PC our system NFS_servr * [s] ratio [s] [s] 89.5 [s] # of PCs 1/2 GPGPU 2012 UDX 12/5/24 p. 51

52 IPv6 PC DHCP IP DHCP IPv6 MAC IP GPGPU 2012 UDX 12/5/24 p. 52

53 PC boot time [s] number of PCs IPv4 NFS IPv4 SSHFS IPv6 SSHFS IPv4 NFS GPGPU 2012 UDX 12/5/24 p. 53

54 NPB EP-D IPv4 NFS IPv6 SSHFS IPv4 SSHFS EP Class D[Mop/s] number of cores IPv4 NFS GPGPU 2012 UDX 12/5/24 p. 54

55 2. GPGPU 2012 UDX 12/5/24 p. 55

56 GPGPU 2012 UDX 12/5/24 p. 56

57 GPR (Ground Penetrating Radar) FDTD 1990 FDTD FPGA Cell/B.E. FDTD GPGPU 2012 UDX 12/5/24 p. 57

58 2D/3D air 0.1 m 1.0 m J y z y O x ground ε r =4.0 σ =0.001 S/m 0.1 m cylinder ε r =1.0 σ =0.0S/m GPGPU 2012 UDX 12/5/24 p. 58

59 2D 3D problem size 1024 x x 256 x 256 source line current point current pulse Gaussian ( 3dB width:0.5 ns) 2.5 x x/y +1.0 scan range (Δx =0.05 m) (Δx =Δy =0.1m) # of scannings ground ε r =4.0, σ =0.001 S/m cylinder ε r =1.0, σ =0.0S/m increments Δ=0.01 m, Δs = s # of time steps 3000 ABC 1st. Mur compiler CUDA 4.0 (gcc O3) GPGPU 2012 UDX 12/5/24 p. 59

60 GPU Geforce GTX PC 1 2 GPU 10 GPGPU 2012 UDX 12/5/24 p. 60

61 3 GPGPU 2012 UDX 12/5/24 p. 61

62 3 CPU/GPU CPU 980X x10 65 GTX 580 x10 30 GPGPU 2012 UDX 12/5/24 p. 62

63 FDTD X GPGPU 2012 UDX 12/5/24 p. 63

64 d 0 = L ε 1,μ 1 ε 2,μ 2 ε 1,μ 1 0 0th stage L x d 1 0 1st stage L x d 2 0 2nd stage L x GPGPU 2012 UDX 12/5/24 p. 64

65 Transmission coefficient (db) d / λ 1st 2nd 3rd ε 2 /ε 1 =4.0 Transmission coefficient (db) d / λ 3 layers 7 layers 15 layers GPGPU 2012 UDX 12/5/24 p. 65

66 0 peak 15-5 Q Minimum transmission (db) Q value stage number 0 GPGPU 2012 UDX 12/5/24 p. 66

67 20 Peak Q 1000 Maximum resonance (db) Q value stage number 10 GPGPU 2012 UDX 12/5/24 p. 67

68 SiO2-TiO nm GPGPU 2012 UDX 12/5/24 p. 68

69 1.2 1 Transmission measured FDTD d 2 /λ 2 GPGPU 2012 UDX 12/5/24 p. 69

70 1 6 GHz S21 FDTD GPGPU 2012 UDX 12/5/24 p. 70

71 ε r =2.25 GPGPU 2012 UDX 12/5/24 p. 71

72 FDTD Transmission coefficient (db) measured FDTD Frequency (GHz) GPGPU 2012 UDX 12/5/24 p. 72

73 LLS FDTD 3 MW-FDTD PC PC GPGPU 2012 UDX 12/5/24 p. 73

74 MW-FDTD Moving Window FDTD MW-FDTD FDTD MW-FDTD MW-FDTD LLS GPGPU 2012 UDX 12/5/24 p. 74

75 3 MW-FDTD PC MW-FDTD 4 MW-FDTD F=1/16 FDTD 47 1/64 FDTD 20 1/16 GPGPU 2012 UDX 12/5/24 p. 75

76 地形モデルによる雷放電電磁界解析 ツールで始める GPGPU 2012 春 秋葉原 UDX 12/5/24 p. 76

77 GPU(Graphics Processing Unit) GPGPU 2012 UDX 12/5/24 p. 77

78 SfM FDTD FDTD SfM (Structure from Motion) FDTD GPGPU 2012 UDX 12/5/24 p. 78

79 AR による電波環境の現実的可視化 FDTD 法 計算結果は電磁界 6 成分の時間応答 計算結 果を分かりやすく表示 AVS 等の 3 次元可視化ソフト 高コスト 現実感なし AR (Augmented Reality) 技術 実際の映像上に人工物を マッピングする技術 AR によるポインティングベクトル分布の可視化 ツールで始める GPGPU 2012 春 秋葉原 UDX 12/5/24 p. 79

GPU GPU CPU CPU CPU GPU GPU N N CPU ( ) 1 GPU CPU GPU 2D 3D CPU GPU GPU GPGPU GPGPU 2 nvidia GPU CUDA 3 GPU 3.1 GPU Core 1

GPU GPU CPU CPU CPU GPU GPU N N CPU ( ) 1 GPU CPU GPU 2D 3D CPU GPU GPU GPGPU GPGPU 2 nvidia GPU CUDA 3 GPU 3.1 GPU Core 1 GPU 4 2010 8 28 1 GPU CPU CPU CPU GPU GPU N N CPU ( ) 1 GPU CPU GPU 2D 3D CPU GPU GPU GPGPU GPGPU 2 nvidia GPU CUDA 3 GPU 3.1 GPU Core 1 Register & Shared Memory ( ) CPU CPU(Intel Core i7 965) GPU(Tesla

More information

iphone GPGPU GPU OpenCL Mac OS X Snow LeopardOpenCL iphone OpenCL OpenCL NVIDIA GPU CUDA GPU GPU GPU 15 GPU GPU CPU GPU iii OpenMP MPI CPU OpenCL CUDA OpenCL CPU OpenCL GPU NVIDIA Fermi GPU Fermi GPU GPU

More information

main.dvi

main.dvi PC 1 1 [1][2] [3][4] ( ) GPU(Graphics Processing Unit) GPU PC GPU PC ( 2 GPU ) GPU Harris Corner Detector[5] CPU ( ) ( ) CPU GPU 2 3 GPU 4 5 6 7 1 toyohiro@isc.kyutech.ac.jp 45 2 ( ) CPU ( ) ( ) () 2.1

More information

07-二村幸孝・出口大輔.indd

07-二村幸孝・出口大輔.indd GPU Graphics Processing Units HPC High Performance Computing GPU GPGPU General-Purpose computation on GPU CPU GPU GPU *1 Intel Quad-Core Xeon E5472 3.0 GHz 2 6 MB L2 cache 1600 MHz FSB 80 GFlops 1 nvidia

More information

スパコンに通じる並列プログラミングの基礎

スパコンに通じる並列プログラミングの基礎 2018.06.04 2018.06.04 1 / 62 2018.06.04 2 / 62 Windows, Mac Unix 0444-J 2018.06.04 3 / 62 Part I Unix GUI CUI: Unix, Windows, Mac OS Part II 2018.06.04 4 / 62 0444-J ( : ) 6 4 ( ) 6 5 * 6 19 SX-ACE * 6

More information

GPU n Graphics Processing Unit CG CAD

GPU n Graphics Processing Unit CG CAD GPU 2016/06/27 第 20 回 GPU コンピューティング講習会 ( 東京工業大学 ) 1 GPU n Graphics Processing Unit CG CAD www.nvidia.co.jp www.autodesk.co.jp www.pixar.com GPU n GPU ü n NVIDIA CUDA ü NVIDIA GPU ü OS Linux, Windows, Mac

More information

スパコンに通じる並列プログラミングの基礎

スパコンに通じる並列プログラミングの基礎 2018.09.10 furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 1 / 59 furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 2 / 59 Windows, Mac Unix 0444-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 3 / 59 Part I Unix GUI CUI:

More information

23 Fig. 2: hwmodulev2 3. Reconfigurable HPC 3.1 hw/sw hw/sw hw/sw FPGA PC FPGA PC FPGA HPC FPGA FPGA hw/sw hw/sw hw- Module FPGA hwmodule hw/sw FPGA h

23 Fig. 2: hwmodulev2 3. Reconfigurable HPC 3.1 hw/sw hw/sw hw/sw FPGA PC FPGA PC FPGA HPC FPGA FPGA hw/sw hw/sw hw- Module FPGA hwmodule hw/sw FPGA h 23 FPGA CUDA Performance Comparison of FPGA Array with CUDA on Poisson Equation (lijiang@sekine-lab.ei.tuat.ac.jp), (kazuki@sekine-lab.ei.tuat.ac.jp), (takahashi@sekine-lab.ei.tuat.ac.jp), (tamukoh@cc.tuat.ac.jp),

More information

untitled

untitled A = QΛQ T A n n Λ Q A = XΛX 1 A n n Λ X GPGPU A 3 T Q T AQ = T (Q: ) T u i = λ i u i T {λ i } {u i } QR MR 3 v i = Q u i A {v i } A n = 9000 Quad Core Xeon 2 LAPACK (4/3) n 3 O(n 2 ) O(n 3 ) A {v i }

More information

untitled

untitled A = QΛQ T A n n Λ Q A = XΛX 1 A n n Λ X GPGPU A 3 T Q T AQ = T (Q: ) T u i = λ i u i T {λ i } {u i } QR MR 3 v i = Q u i A {v i } A n = 9000 Quad Core Xeon 2 LAPACK (4/3) n 3 O(n 2 ) O(n 3 ) A {v i }

More information

1重谷.PDF

1重谷.PDF RSCC RSCC RSCC BMT 1 6 3 3000 3000 200310 1994 19942 VPP500/32PE 19992 VPP700E/128PE 160PE 20043 2 2 PC Linux 2048 CPU Intel Xeon 3.06GHzDual) 12.5 TFLOPS SX-7 32CPU/256GB 282.5 GFLOPS Linux 3 PC 1999

More information

TSUBAME2.0 における GPU の 活用方法 東京工業大学学術国際情報センター丸山直也第 10 回 GPU コンピューティング講習会 2011 年 9 月 28 日

TSUBAME2.0 における GPU の 活用方法 東京工業大学学術国際情報センター丸山直也第 10 回 GPU コンピューティング講習会 2011 年 9 月 28 日 TSUBAME2.0 における GPU の 活用方法 東京工業大学学術国際情報センター丸山直也第 10 回 GPU コンピューティング講習会 2011 年 9 月 28 日 目次 1. TSUBAMEのGPU 環境 2. プログラム作成 3. プログラム実行 4. 性能解析 デバッグ サンプルコードは /work0/gsic/seminars/gpu- 2011-09- 28 からコピー可能です 1.

More information

GPGPU

GPGPU GPGPU 2013 1008 2015 1 23 Abstract In recent years, with the advance of microscope technology, the alive cells have been able to observe. On the other hand, from the standpoint of image processing, the

More information

EGunGPU

EGunGPU Super Computing in Accelerator simulations - Electron Gun simulation using GPGPU - K. Ohmi, KEK-Accel Accelerator Physics seminar 2009.11.19 Super computers in KEK HITACHI SR11000 POWER5 16 24GB 16 134GFlops,

More information

HP High Performance Computing(HPC)

HP High Performance Computing(HPC) ACCELERATE HP High Performance Computing HPC HPC HPC HPC HPC 1000 HPHPC HPC HP HPC HPC HPC HP HPCHP HP HPC 1 HPC HP 2 HPC HPC HP ITIDC HP HPC 1HPC HPC No.1 HPC TOP500 2010 11 HP 159 32% HP HPCHP 2010 Q1-Q4

More information

スライド 1

スライド 1 GPU クラスタによる格子 QCD 計算 広大理尾崎裕介 石川健一 1.1 Introduction Graphic Processing Units 1 チップに数百個の演算器 多数の演算器による並列計算 ~TFLOPS ( 単精度 ) CPU 数十 GFLOPS バンド幅 ~100GB/s コストパフォーマンス ~$400 GPU の開発環境 NVIDIA CUDA http://www.nvidia.co.jp/object/cuda_home_new_jp.html

More information

develop

develop SCore SCore 02/03/20 2 1 HA (High Availability) HPC (High Performance Computing) 02/03/20 3 HA (High Availability) Mail/Web/News/File Server HPC (High Performance Computing) Job Dispatching( ) Parallel

More information

! 行行 CPUDSP PPESPECell/B.E. CPUGPU 行行 SIMD [SSE, AltiVec] 用 HPC CPUDSP PPESPE (Cell/B.E.) SPE CPUGPU GPU CPU DSP DSP PPE SPE SPE CPU DSP SPE 2

! 行行 CPUDSP PPESPECell/B.E. CPUGPU 行行 SIMD [SSE, AltiVec] 用 HPC CPUDSP PPESPE (Cell/B.E.) SPE CPUGPU GPU CPU DSP DSP PPE SPE SPE CPU DSP SPE 2 ! OpenCL [Open Computing Language] 言 [OpenCL C 言 ] CPU, GPU, Cell/B.E.,DSP 言 行行 [OpenCL Runtime] OpenCL C 言 API Khronos OpenCL Working Group AMD Broadcom Blizzard Apple ARM Codeplay Electronic Arts Freescale

More information

スパコンに通じる並列プログラミングの基礎

スパコンに通じる並列プログラミングの基礎 2016.06.06 2016.06.06 1 / 60 2016.06.06 2 / 60 Windows, Mac Unix 0444-J 2016.06.06 3 / 60 Part I Unix GUI CUI: Unix, Windows, Mac OS Part II 0444-J 2016.06.06 4 / 60 ( : ) 6 6 ( ) 6 10 6 16 SX-ACE 6 17

More information

マルチコアPCクラスタ環境におけるBDD法のハイブリッド並列実装

マルチコアPCクラスタ環境におけるBDD法のハイブリッド並列実装 2010 GPGPU 2010 9 29 MPI/Pthread (DDM) DDM CPU CPU CPU CPU FEM GPU FEM CPU Mult - NUMA Multprocessng Cell GPU Accelerator, GPU CPU Heterogeneous computng L3 cache L3 cache CPU CPU + GPU GPU L3 cache 4

More information

supercomputer2010.ppt

supercomputer2010.ppt nanri@cc.kyushu-u.ac.jp 1 !! : 11 12! : nanri@cc.kyushu-u.ac.jp! : Word 2 ! PC GPU) 1997 7 http://wiredvision.jp/news/200806/2008062322.html 3 !! (Cell, GPU )! 4 ! etc...! 5 !! etc. 6 !! 20km 40 km ) 340km

More information

CPU Levels in the memory hierarchy Level 1 Level 2... Increasing distance from the CPU in access time Level n Size of the memory at each level 1: 2.2

CPU Levels in the memory hierarchy Level 1 Level 2... Increasing distance from the CPU in access time Level n Size of the memory at each level 1: 2.2 FFT 1 Fourier fast Fourier transform FFT FFT FFT 1 FFT FFT 2 Fourier 2.1 Fourier FFT Fourier discrete Fourier transform DFT DFT n 1 y k = j=0 x j ω jk n, 0 k n 1 (1) x j y k ω n = e 2πi/n i = 1 (1) n DFT

More information

Microsoft PowerPoint - GPU_computing_2013_01.pptx

Microsoft PowerPoint - GPU_computing_2013_01.pptx GPU コンピューティン No.1 導入 東京工業大学 学術国際情報センター 青木尊之 1 GPU とは 2 GPGPU (General-purpose computing on graphics processing units) GPU を画像処理以外の一般的計算に使う GPU の魅力 高性能 : ハイエンド GPU はピーク 4 TFLOPS 超 手軽さ : 普通の PC にも装着できる 低価格

More information

HPC (pay-as-you-go) HPC Web 2

HPC (pay-as-you-go) HPC Web 2 ,, 1 HPC (pay-as-you-go) HPC Web 2 HPC Amazon EC2 OpenFOAM GPU EC2 3 HPC MPI MPI Courant 1 GPGPU MPI 4 AMAZON EC2 GPU CLUSTER COMPUTE INSTANCE EC2 GPU (cg1.4xlarge) ( N. Virgina ) Quadcore Intel Xeon 5570

More information

1 GPU GPGPU GPU CPU 2 GPU 2007 NVIDIA GPGPU CUDA[3] GPGPU CUDA GPGPU CUDA GPGPU GPU GPU GPU Graphics Processing Unit LSI LSI CPU ( ) DRAM GPU LSI GPU

1 GPU GPGPU GPU CPU 2 GPU 2007 NVIDIA GPGPU CUDA[3] GPGPU CUDA GPGPU CUDA GPGPU GPU GPU GPU Graphics Processing Unit LSI LSI CPU ( ) DRAM GPU LSI GPU GPGPU (I) GPU GPGPU 1 GPU(Graphics Processing Unit) GPU GPGPU(General-Purpose computing on GPUs) GPU GPGPU GPU ( PC ) PC PC GPU PC PC GPU GPU 2008 TSUBAME NVIDIA GPU(Tesla S1070) TOP500 29 [1] 2009 AMD

More information

1 OpenCL OpenCL 1 OpenCL GPU ( ) 1 OpenCL Compute Units Elements OpenCL OpenCL SPMD (Single-Program, Multiple-Data) SPMD OpenCL work-item work-group N

1 OpenCL OpenCL 1 OpenCL GPU ( ) 1 OpenCL Compute Units Elements OpenCL OpenCL SPMD (Single-Program, Multiple-Data) SPMD OpenCL work-item work-group N GPU 1 1 2 1, 3 2, 3 (Graphics Unit: GPU) GPU GPU GPU Evaluation of GPU Computing Based on An Automatic Program Generation Technology Makoto Sugawara, 1 Katsuto Sato, 1 Kazuhiko Komatsu, 2 Hiroyuki Takizawa

More information

( CUDA CUDA CUDA CUDA ( NVIDIA CUDA I

(    CUDA CUDA CUDA CUDA (  NVIDIA CUDA I GPGPU (II) GPGPU CUDA 1 GPGPU CUDA(CUDA Unified Device Architecture) CUDA NVIDIA GPU *1 C/C++ (nvcc) CUDA NVIDIA GPU GPU CUDA CUDA 1 CUDA CUDA 2 CUDA NVIDIA GPU PC Windows Linux MaxOSX CUDA GPU CUDA NVIDIA

More information

Vol.214-HPC-145 No /7/3 C #pragma acc directive-name [clause [[,] clause] ] new-line structured block Fortran!$acc directive-name [clause [[,] c

Vol.214-HPC-145 No /7/3 C #pragma acc directive-name [clause [[,] clause] ] new-line structured block Fortran!$acc directive-name [clause [[,] c Vol.214-HPC-145 No.45 214/7/3 OpenACC 1 3,1,2 1,2 GPU CUDA OpenCL OpenACC OpenACC High-level OpenACC CPU Intex Xeon Phi K2X GPU Intel Xeon Phi 27% K2X GPU 24% 1. TSUBAME2.5 CPU GPU CUDA OpenCL CPU OpenMP

More information

XACCの概要

XACCの概要 2 global void kernel(int a[max], int llimit, int ulimit) {... } : int main(int argc, char *argv[]){ MPI_Int(&argc, &argc); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); dx

More information

PC Development of Distributed PC Grid System,,,, Junji Umemoto, Hiroyuki Ebara, Katsumi Onishi, Hiroaki Morikawa, and Bunryu U PC WAN PC PC WAN PC 1 P

PC Development of Distributed PC Grid System,,,, Junji Umemoto, Hiroyuki Ebara, Katsumi Onishi, Hiroaki Morikawa, and Bunryu U PC WAN PC PC WAN PC 1 P PC Development of Distributed PC Grid System,,,, Junji Umemoto, Hiroyuki Ebara, Katsumi Onishi, Hiroaki Morikawa, and Bunryu U PC WAN PC PC WAN PC 1 PC PC PC PC PC Key Words:Grid, PC Cluster, Distributed

More information

Slides: TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments

Slides: TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments 計算機アーキテクチャ第 11 回 マルチプロセッサ 本資料は授業用です 無断で転載することを禁じます 名古屋大学 大学院情報科学研究科 准教授加藤真平 デスクトップ ジョブレベル並列性 スーパーコンピュータ 並列処理プログラム プログラムの並列化 for (i = 0; i < N; i++) { x[i] = a[i] + b[i]; } プログラムの並列化 x[0] = a[0] + b[0];

More information

卒業論文

卒業論文 PC OpenMP SCore PC OpenMP PC PC PC Myrinet PC PC 1 OpenMP 2 1 3 3 PC 8 OpenMP 11 15 15 16 16 18 19 19 19 20 20 21 21 23 26 29 30 31 32 33 4 5 6 7 SCore 9 PC 10 OpenMP 14 16 17 10 17 11 19 12 19 13 20 1421

More information

09中西

09中西 PC NEC Linux (1) (2) (1) (2) 1 Linux Linux 2002.11.22) LLNL Linux Intel Xeon 2300 ASCIWhite1/7 / HPC (IDC) 2002 800 2005 2004 HPC 80%Linux) Linux ASCI Purple (ASCI 100TFlops Blue Gene/L 1PFlops (2005)

More information

FINAL PROGRAM 22th Annual Workshop SWoPP / / 2009 Sendai Summer United Workshops on Parallel, Distributed, and Cooperative Processing

FINAL PROGRAM 22th Annual Workshop SWoPP / / 2009 Sendai Summer United Workshops on Parallel, Distributed, and Cooperative Processing FINAL PROGRAM 22th Annual Workshop SWoPP 2009 2009 / / 2009 Sendai Summer United Workshops on Parallel, Distributed, and Cooperative Processing 2009 8 4 ( ) 8 6 ( ) 981-0933 1-2-45 http://www.forestsendai.jp

More information

GPU チュートリアル :OpenACC 篇 Himeno benchmark を例題として 高エネルギー加速器研究機構 (KEK) 松古栄夫 (Hideo Matsufuru) 1 December 2018 HPC-Phys 理化学研究所 共通コードプロジェクト

GPU チュートリアル :OpenACC 篇 Himeno benchmark を例題として 高エネルギー加速器研究機構 (KEK) 松古栄夫 (Hideo Matsufuru) 1 December 2018 HPC-Phys 理化学研究所 共通コードプロジェクト GPU チュートリアル :OpenACC 篇 Himeno benchmark を例題として 高エネルギー加速器研究機構 (KEK) 松古栄夫 (Hideo Matsufuru) 1 December 2018 HPC-Phys 勉強会 @ 理化学研究所 共通コードプロジェクト Contents Hands On 環境について Introduction to GPU computing Introduction

More information

FINAL PROGRAM 25th Annual Workshop SWoPP / / 2012 Tottori Summer United Workshops on Parallel, Distributed, and Cooperative Processing 2012

FINAL PROGRAM 25th Annual Workshop SWoPP / / 2012 Tottori Summer United Workshops on Parallel, Distributed, and Cooperative Processing 2012 FINAL PROGRAM 25th Annual Workshop SWoPP 2012 2012 / / 2012 Tottori Summer United Workshops on Parallel, Distributed, and Cooperative Processing 2012 8 1 ( ) 8 3 ( ) 680-0017 101-5 http://www.torikenmin.jp/kenbun/

More information

untitled

untitled taisuke@cs.tsukuba.ac.jp http://www.hpcs.is.tsukuba.ac.jp/~taisuke/ CP-PACS HPC PC post CP-PACS CP-PACS II 1990 HPC RWCP, HPC かつての世界最高速計算機も 1996年11月のTOP500 第一位 ピーク性能 614 GFLOPS Linpack性能 368 GFLOPS (地球シミュレータの前

More information

Express5800/140Ma

Express5800/140Ma Pentium Xeon Express 1. N8500-324 N8500-325 N8500-326 N8500-327 (X/450(512)) (X/450(512)-25AW) (X/450(1)) (X/450(1)-25AW) Windows NT Server 4.0 CPU Pentium Xeon 450MHz1 4 L1 32KB L2 512KB 1MB CD-ROM LAN

More information

AMD/ATI Radeon HD 5870 GPU DEGIMA LINPACK HD 5870 GPU DEGIMA LINPACK GFlops/Watt GFlops/Watt Abstract GPU Computing has lately attracted

AMD/ATI Radeon HD 5870 GPU DEGIMA LINPACK HD 5870 GPU DEGIMA LINPACK GFlops/Watt GFlops/Watt Abstract GPU Computing has lately attracted DEGIMA LINPACK Energy Performance for LINPACK Benchmark on DEGIMA 1 AMD/ATI Radeon HD 5870 GPU DEGIMA LINPACK HD 5870 GPU DEGIMA LINPACK 1.4698 GFlops/Watt 1.9658 GFlops/Watt Abstract GPU Computing has

More information

MATLAB® における並列・分散コンピューティング ~ Parallel Computing Toolbox™ & MATLAB Distributed Computing Server™ ~

MATLAB® における並列・分散コンピューティング ~ Parallel Computing Toolbox™ & MATLAB Distributed Computing Server™ ~ MATLAB における並列 分散コンピューティング ~ Parallel Computing Toolbox & MATLAB Distributed Computing Server ~ MathWorks Japan Application Engineering Group Takashi Yoshida 2016 The MathWorks, Inc. 1 System Configuration

More information

FIT2013( 第 12 回情報科学技術フォーラム ) I-032 Acceleration of Adaptive Bilateral Filter base on Spatial Decomposition and Symmetry of Weights 1. Taiki Makishi Ch

FIT2013( 第 12 回情報科学技術フォーラム ) I-032 Acceleration of Adaptive Bilateral Filter base on Spatial Decomposition and Symmetry of Weights 1. Taiki Makishi Ch I-032 Acceleration of Adaptive Bilateral Filter base on Spatial Decomposition and Symmetry of Weights 1. Taiki Makishi Chikatoshi Yamada Shuichi Ichikawa Gaussian Filter GF GF Bilateral Filter BF CG [1]

More information

Shonan Institute of Technology MEMOIRS OF SHONAN INSTITUTE OF TECHNOLOGY Vol. 41, No. 1, 2007 Ships1 * ** ** ** Development of a Small-Mid Range Paral

Shonan Institute of Technology MEMOIRS OF SHONAN INSTITUTE OF TECHNOLOGY Vol. 41, No. 1, 2007 Ships1 * ** ** ** Development of a Small-Mid Range Paral MEMOIRS OF SHONAN INSTITUTE OF TECHNOLOGY Vol. 41, No. 1, 2007 Ships1 * ** ** ** Development of a Small-Mid Range Parallel Computer Ships1 Makoto OYA*, Hiroto MATSUBARA**, Kazuyoshi SAKURAI** and Yu KATO**

More information

HPEハイパフォーマンスコンピューティング ソリューション

HPEハイパフォーマンスコンピューティング ソリューション HPE HPC / AI Page 2 No.1 * 24.8% No.1 * HPE HPC / AI HPC AI SGIHPE HPC / AI GPU TOP500 50th edition Nov. 2017 HPE No.1 124 www.top500.org HPE HPC / AI TSUBAME 3.0 2017 7 AI TSUBAME 3.0 HPE SGI 8600 System

More information

10D16.dvi

10D16.dvi D IEEJ Transactions on Industry Applications Vol.136 No.10 pp.686 691 DOI: 10.1541/ieejias.136.686 NW Accelerating Techniques for Sequence Alignment based on an Extended NW Algorithm Jin Okaze, Non-member,

More information

21 20 20413525 22 2 4 i 1 1 2 4 2.1.................................. 4 2.1.1 LinuxOS....................... 7 2.1.2....................... 10 2.2........................ 15 3 17 3.1.................................

More information

Express5800/140Ma

Express5800/140Ma Pentium Xeon Express 1. N8500-479 N8500-480 N8500-489,-490 N8500-491,-492 (-X/550(512)-25AWS) (-X/550(1)-25AWS) (-X/550(512)) (-X/550(1)) (-X/550(512)-25AWE) (-X/550(1)-25AWE) CPU L1 Pentium Xeon 550MHz1

More information

DO 時間積分 START 反変速度の計算 contravariant_velocity 移流項の計算 advection_adams_bashforth_2nd DO implicit loop( 陰解法 ) 速度勾配, 温度勾配の計算 gradient_cell_center_surface 速

DO 時間積分 START 反変速度の計算 contravariant_velocity 移流項の計算 advection_adams_bashforth_2nd DO implicit loop( 陰解法 ) 速度勾配, 温度勾配の計算 gradient_cell_center_surface 速 1 1, 2 1, 2 3 2, 3 4 GP LES ASUCA LES NVIDIA CUDA LES 1. Graphics Processing Unit GP General-Purpose SIMT Single Instruction Multiple Threads 1 2 3 4 1),2) LES Large Eddy Simulation 3) ASUCA 4) LES LES

More information

strtok-count.eps

strtok-count.eps IoT FPGA 2016/12/1 IoT FPGA 200MHz 32 ASCII PCI Express FPGA OpenCL (Volvox) Volvox CPU 10 1 IoT (Internet of Things) 2020 208 [1] IoT IoT HTTP JSON ( Python Ruby) IoT IoT IoT (Hadoop [2] ) AI (Artificial

More information

211 年ハイパフォーマンスコンピューティングと計算科学シンポジウム Computing Symposium 211 HPCS /1/18 a a 1 a 2 a 3 a a GPU Graphics Processing Unit GPU CPU GPU GPGPU G

211 年ハイパフォーマンスコンピューティングと計算科学シンポジウム Computing Symposium 211 HPCS /1/18 a a 1 a 2 a 3 a a GPU Graphics Processing Unit GPU CPU GPU GPGPU G 211 年ハイパフォーマンスコンピューティングと計算科学シンポジウム Computing Symposium 211 HPCS211 211/1/18 GPU 4 8 BLAS 4 8 BLAS Basic Linear Algebra Subprograms GPU Graphics Processing Unit 4 8 double 2 4 double-double DD 4 4 8 quad-double

More information

IPSJ SIG Technical Report Vol.2013-ARC-203 No /2/1 SMYLE OpenCL (NEDO) IT FPGA SMYLEref SMYLE OpenCL SMYLE OpenCL FPGA 1

IPSJ SIG Technical Report Vol.2013-ARC-203 No /2/1 SMYLE OpenCL (NEDO) IT FPGA SMYLEref SMYLE OpenCL SMYLE OpenCL FPGA 1 SMYLE OpenCL 128 1 1 1 1 1 2 2 3 3 3 (NEDO) IT FPGA SMYLEref SMYLE OpenCL SMYLE OpenCL FPGA 128 SMYLEref SMYLE OpenCL SMYLE OpenCL Implementation and Evaluations on 128 Cores Takuji Hieda 1 Noriko Etani

More information

高校生の就職への数学II

高校生の就職への数学II II O Tped b L A TEX ε . II. 3. 4. 5. http://www.ocn.ne.jp/ oboetene/plan/ 7 9 i .......................................................................................... 3..3...............................

More information

( )

( ) 1. 2. 3. 4. 5. ( ) () http://www-astro.physics.ox.ac.uk/~wjs/apm_grey.gif http://antwrp.gsfc.nasa.gov/apod/ap950917.html ( ) SDSS : d 2 r i dt 2 = Gm jr ij j i rij 3 = Newton 3 0.1% 19 20 20 2 ( ) 3 3

More information

HP Workstation 総合カタログ

HP Workstation 総合カタログ HP Workstation Z HP 6 Z HP HP Z840 Workstation P.9 HP Z640 Workstation & CPU P.10 HP Z440 Workstation P.11 17.3in WIDE HP ZBook 17 G2 Mobile Workstation P.15 15.6in WIDE HP ZBook 15 G2 Mobile Workstation

More information

Cell/B.E. BlockLib

Cell/B.E. BlockLib Cell/B.E. BlockLib 17 17115080 21 2 10 i Cell/B.E. BlockLib SIMD CELL SIMD Cell Cell BlockLib BlockLib NestStep libspe1 Cell SDK 3.1 libspe2 BlockLib Cell SDK 3.1 NestStep libspe2 BlockLib BlockLib libspe1

More information

A Study of Adaptive Array Implimentation for mobile comunication in cellular system GD133

A Study of Adaptive Array Implimentation for mobile comunication in cellular system GD133 A Study of Adaptive Array Implimentation for mobile comunication in cellular system 15 1 31 01GD133 LSI DSP CMA 10km/s i 1 1 2 LS-CMA 5 2.1 CMA... 5 2.1.1... 5 2.1.2... 7 2.1.3... 10 2.2 LS-CMA... 13 2.2.1...

More information

01_OpenMP_osx.indd

01_OpenMP_osx.indd OpenMP* / 1 1... 2 2... 3 3... 5 4... 7 5... 9 5.1... 9 5.2 OpenMP* API... 13 6... 17 7... 19 / 4 1 2 C/C++ OpenMP* 3 Fortran OpenMP* 4 PC 1 1 9.0 Linux* Windows* Xeon Itanium OS 1 2 2 WEB OS OS OS 1 OS

More information

IPSJ SIG Technical Report Vol.2014-ARC-213 No.24 Vol.2014-HPC-147 No /12/10 GPU 1,a) 1,b) 1,c) 1,d) GPU GPU Structure Of Array Array Of

IPSJ SIG Technical Report Vol.2014-ARC-213 No.24 Vol.2014-HPC-147 No /12/10 GPU 1,a) 1,b) 1,c) 1,d) GPU GPU Structure Of Array Array Of GPU 1,a) 1,b) 1,c) 1,d) GPU 1 GPU Structure Of Array Array Of Structure 1. MPS(Moving Particle Semi-Implicit) [1] SPH(Smoothed Particle Hydrodynamics) [] DEM(Distinct Element Method)[] [] 1 Tokyo Institute

More information

GPUコンピューティング講習会パート1

GPUコンピューティング講習会パート1 GPU コンピューティング (CUDA) 講習会 GPU と GPU を用いた計算の概要 丸山直也 スケジュール 13:20-13:50 GPU を用いた計算の概要 担当丸山 13:50-14:30 GPU コンピューティングによる HPC アプリケーションの高速化の事例紹介 担当青木 14:30-14:40 休憩 14:40-17:00 CUDA プログラミングの基礎 担当丸山 TSUBAME の

More information

untitled

untitled 全 方 位 型 藝 夢 真 剣 考 察 誌 Vol.13 GameDeep main issue MMORPG デザインという 深 淵 other PS3 の 夢 と 現 実 双 六 の 果 てに 人 はゲームなるものを 知 るか 人 生 ゲームな 日 本 モノポリーなアメリカ 先 祖 返 りの 夢 の 失 敗 :アンリミテッド サガ 再 評 価 ゲーム 売 り 場 なんです これでも http://gamedeep.niu.ne.jp/

More information

2005 1

2005 1 2005 1 1 1 2 2 2.1....................................... 2 2.2................................... 5 2.3 VSWR................................. 6 2.4 VSWR 2............................ 7 2.5.......................................

More information

1 4 1.1........................................... 4 1.2.................................. 4 1.3................................... 4 2 5 2.1 GPU.....

1 4 1.1........................................... 4 1.2.................................. 4 1.3................................... 4 2 5 2.1 GPU..... CPU GPU N Q07-065 2011 2 17 1 1 4 1.1........................................... 4 1.2.................................. 4 1.3................................... 4 2 5 2.1 GPU...........................................

More information

IPSJ SIG Technical Report Vol.2012-ARC-202 No.13 Vol.2012-HPC-137 No /12/13 Tightly Coupled Accelerators 1,a) 1,b) 1,c) 1,d) GPU HA-PACS

IPSJ SIG Technical Report Vol.2012-ARC-202 No.13 Vol.2012-HPC-137 No /12/13 Tightly Coupled Accelerators 1,a) 1,b) 1,c) 1,d) GPU HA-PACS Tightly Coupled Accelerators 1,a) 1,b) 1,c) 1,d) HA-PACS 2012 2 HA-PACS TCA (Tightly Coupled Accelerators) TCA PEACH2 1. (Graphics Processing Unit) HPC GP(General Purpose ) TOP500 [1] CPU PCI Express (PCIe)

More information

SWoPP BOF BOF-1 8/3 19:10 BoF SWoPP : BOF-2 8/5 17:00 19:00 HW/SW 15 x5 SimMips/MieruPC M-Core/SimMc FPGA S

SWoPP BOF BOF-1 8/3 19:10 BoF SWoPP :   BOF-2 8/5 17:00 19:00 HW/SW 15 x5 SimMips/MieruPC M-Core/SimMc FPGA S FINAL PROGRAM 23rd Annual Workshop SWoPP 2010 2010 / / 2010 Kanazawa Summer United Workshops on Parallel, Distributed, and Cooperative Processing 2010 8 3 ( ) 8 5 ( ) 920-0864 15 1 http://www.bunka-h.gr.jp/

More information

Second-semi.PDF

Second-semi.PDF PC 2000 2 18 2 HPC Agenda PC Linux OS UNIX OS Linux Linux OS HPC 1 1CPU CPU Beowulf PC (PC) PC CPU(Pentium ) Beowulf: NASA Tomas Sterling Donald Becker 2 (PC ) Beowulf PC!! Linux Cluster (1) Level 1:

More information

[1] [2] [3] (RTT) 2. Android OS Android OS Google OS 69.7% [4] 1 Android Linux [5] Linux OS Android Runtime Dalvik Dalvik UI Application(Home,T

[1] [2] [3] (RTT) 2. Android OS Android OS Google OS 69.7% [4] 1 Android Linux [5] Linux OS Android Runtime Dalvik Dalvik UI Application(Home,T LAN Android Transmission-Control Middleware on multiple Android Terminals in a WLAN Environment with consideration of Round Trip Time Ai HAYAKAWA, Saneyasu YAMAGUCHI, and Masato OGUCHI Ochanomizu University

More information

85 4

85 4 85 4 86 Copright c 005 Kumanekosha 4.1 ( ) ( t ) t, t 4.1.1 t Step! (Step 1) (, 0) (Step ) ±V t (, t) I Check! P P V t π 54 t = 0 + V (, t) π θ : = θ : π ) θ = π ± sin ± cos t = 0 (, 0) = sin π V + t +V

More information

HPC pdf

HPC pdf GPU 1 1 2 2 1 1024 3 GPUGraphics Unit1024 3 GPU GPU GPU GPU 1024 3 Tesla S1070-400 1 GPU 2.6 Accelerating Out-of-core Cone Beam Reconstruction Using GPU Yusuke Okitsu, 1 Fumihiko Ino, 1 Taketo Kishi, 2

More information

GPUコンピューティング講習会パート1

GPUコンピューティング講習会パート1 GPU コンピューティング (CUDA) 講習会 GPU と GPU を用いた計算の概要 丸山直也 スケジュール 13:20-13:50 GPU を用いた計算の概要 担当丸山 13:50-14:30 GPU コンピューティングによる HPC アプリケーションの高速化の事例紹介 担当青木 14:30-14:40 休憩 14:40-17:00 CUDA プログラミングの基礎 担当丸山 TSUBAME の

More information

i

i 009 I 1 8 5 i 0 1 0.1..................................... 1 0.................................................. 1 0.3................................. 0.4........................................... 3

More information

IPSJ SIG Technical Report Vol.2014-CG-155 No /6/28 1,a) 1,2,3 1 3,4 CG An Interpolation Method of Different Flow Fields using Polar Inter

IPSJ SIG Technical Report Vol.2014-CG-155 No /6/28 1,a) 1,2,3 1 3,4 CG An Interpolation Method of Different Flow Fields using Polar Inter ,a),2,3 3,4 CG 2 2 2 An Interpolation Method of Different Flow Fields using Polar Interpolation Syuhei Sato,a) Yoshinori Dobashi,2,3 Tsuyoshi Yamamoto Tomoyuki Nishita 3,4 Abstract: Recently, realistic

More information

untitled

untitled + From Tradeoffs of Receive and Transmit Equalization Architectures, ICC006,Bryan Casper, Intel Labs Transmitter Receiver 0 magnitude (db) 0 0 30 40 50 60 0 4 frequency (GHz). Receiver Transmitter FFE

More information

LP-S820

LP-S820 K Q OS Windows Windows 7 EPSON EXCEED YOUR VISION Mac Macintosh Mac OS Apple Inc. Microsoft Windows Windows Server Windows Vista Microsoft Corporation Adobe Adobe Reader Adobe Systems Incorporated ...4...

More information

main.dvi

main.dvi FDTD S A Study on FDTD Analysis based on S-Parameter 18 2 7 04GD168 FDTD FDTD S S FDTD S S S S FDTD FDTD i 1 1 1.1 FDTD.................................... 1 1.2 FDTD..................... 3 2 S 5 2.1 FDTD

More information

NUMAの構成

NUMAの構成 GPU のプログラム 天野 アクセラレータとは? 特定の性質のプログラムを高速化するプロセッサ 典型的なアクセラレータ GPU(Graphic Processing Unit) Xeon Phi FPGA(Field Programmable Gate Array) 最近出て来た Deep Learning 用ニューロチップなど Domain Specific Architecture 1GPGPU:General

More information

Express5800/120Mc

Express5800/120Mc Pentium Xeon 1. N8500-436 CPU L1 L2 CD-ROM LAN OS OS (-X/600(256)) N8500-437 N8500-509 N8500-443 N8500-438 N8500-488 (-X /600(256)-25AWS) (-X /600(256)-25AWE) StarOffice Exchange (-X/733(256)) (-X /733(256)-25AWS)

More information

1 Table 1: Identification by color of voxel Voxel Mode of expression Nothing Other 1 Orange 2 Blue 3 Yellow 4 SSL Humanoid SSL-Vision 3 3 [, 21] 8 325

1 Table 1: Identification by color of voxel Voxel Mode of expression Nothing Other 1 Orange 2 Blue 3 Yellow 4 SSL Humanoid SSL-Vision 3 3 [, 21] 8 325 社団法人人工知能学会 Japanese Society for Artificial Intelligence 人工知能学会研究会資料 JSAI Technical Report SIG-Challenge-B3 (5/5) RoboCup SSL Humanoid A Proposal and its Application of Color Voxel Server for RoboCup SSL

More information

Logitec NAS シリーズ ソフトウェアマニュアル

Logitec NAS シリーズ ソフトウェアマニュアル LAS-SFB V03A LAS-RAN LAS-MRN LAS-1UMR LAS-1U LHD-NAS ... 3... 3... 5... 6 1... 8... 9 1... 10 Windows... 10 Macintosh... 13 2IP... 14 IP Windows... 14 IP Macintosh... 17... 19... 24... 25 Windows Me2000

More information

FFTSS Library Version 3.0 User's Guide

FFTSS Library Version 3.0 User's Guide : 19 10 31 FFTSS 3.0 Copyright (C) 2002-2007 The Scalable Software Infrastructure Project, (CREST),,. http://www.ssisc.org/ Contents 1 4 2 (DFT) 4 3 4 3.1 UNIX............................................

More information

26102 (1/2) LSISoC: (1) (*) (*) GPU SIMD MIMD FPGA DES, AES (2/2) (2) FPGA(8bit) (ISS: Instruction Set Simulator) (3) (4) LSI ECU110100ECU1 ECU ECU ECU ECU FPGA ECU main() { int i, j, k for { } 1 GP-GPU

More information

1, 2, 2, 2, 2 Recovery Motion Learning for Single-Armed Mobile Robot in Drive System s Fault Tauku ITO 1, Hitoshi KONO 2, Yusuke TAMURA 2, Atsushi YAM

1, 2, 2, 2, 2 Recovery Motion Learning for Single-Armed Mobile Robot in Drive System s Fault Tauku ITO 1, Hitoshi KONO 2, Yusuke TAMURA 2, Atsushi YAM 1, 2, 2, 2, 2 Recovery Motion Learning for Single-Armed Mobile Robot in Drive System s Fault Tauku ITO 1, Hitoshi KONO 2, Yusuke TAMURA 2, Atsushi YAMASHITA 2 and Hajime ASAMA 2 1 Department of Precision

More information

untitled

untitled SPring-8 RFgun JASRI/SPring-8 6..7 Contents.. 3.. 5. 6. 7. 8. . 3 cavity γ E A = er 3 πε γ vb r B = v E c r c A B A ( ) F = e E + v B A A A A B dp e( v B+ E) = = m d dt dt ( γ v) dv e ( ) dt v B E v E

More information

地中レーダによる地下計測

地中レーダによる地下計測 2001 7 sato@cneas.tohoku.ac.jp Tel&Fax:022-217-6075 1 (Ground Penetrating radar: GPR) ( ) 2 2.1 10MHz ε r c 3 v = = 10 8 ( m/ s) (1) ε ε r r 1 f (Hz) λ (m) ε r λ = vt = v f ( m) (2) 2.2 1 1 τ (s) d(m)

More information

OptiPlex OptiPlex 4 OptiPlex vpro Energy STAR5.0 EPEAT GOLD 90 Energy Smart Energy Smart

OptiPlex OptiPlex 4 OptiPlex vpro Energy STAR5.0 EPEAT GOLD 90 Energy Smart Energy Smart Dell OptiPlex PC OptiPlex 980 780 380 FX160 / FX100 www.dell.com/jp December / 2010 Core i5 vpro OptiPlex OptiPlex 4 OptiPlex vpro Energy STAR5.0 EPEAT GOLD 90 Energy Smart Energy Smart 2007 7 2 OptiPlex

More information

Itanium2ベンチマーク

Itanium2ベンチマーク HPC CPU mhori@ile.osaka-u.ac.jp Special thanks Timur Esirkepov HPC 2004 2 25 1 1. CPU 2. 3. Itanium 2 HPC 2 1 Itanium2 CPU CPU 3 ( ) Intel Itanium2 NEC SX-6 HP Alpha Server ES40 PRIMEPOWER SR8000 Intel

More information

OpenGL GLSL References Kageyama (Kobe Univ.) Visualization / 58

OpenGL GLSL References Kageyama (Kobe Univ.) Visualization / 58 WebGL *1 2013.04.23 *1 X021 2013 LR301 Kageyama (Kobe Univ.) Visualization 2013.04.23 1 / 58 OpenGL GLSL References Kageyama (Kobe Univ.) Visualization 2013.04.23 2 / 58 Kageyama (Kobe Univ.) Visualization

More information

VLSI工学

VLSI工学 2008/1/15 (12) 1 2008/1/15 (12) 2 (12) http://ssc.pe.titech.ac.jp 2008/1/15 (12) 3 VLSI 100W P d f clk C V 2 dd I I I leak sub g = I sub + I g qv exp nkt exp ( 5. 6V 10T 2. 5) gd T V T ox Gordon E. Moore,

More information

T2EX T-Engine Version 1.01 Copyright c 2013 Personal Media Corporation

T2EX T-Engine Version 1.01 Copyright c 2013 Personal Media Corporation T2EX T-Engine Version 1.01 Copyright 2 3 1 4 1.1......................... 4 1.2.................... 5 1.2.1 T-Engine.... 5 1.2.2 PC............... 5 1.3....................... 6 2 8 2.1............................

More information

Mott散乱によるParity対称性の破れを検証

Mott散乱によるParity対称性の破れを検証 Mott Parity P2 Mott target Mott Parity Parity Γ = 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 t P P ),,, ( 3 2 1 0 1 γ γ γ γ γ γ ν ν µ µ = = Γ 1 : : : Γ P P P P x x P ν ν µ µ vector axial vector ν ν µ µ γ γ Γ ν γ

More information

1 u t = au (finite difference) u t = au Von Neumann

1 u t = au (finite difference) u t = au Von Neumann 1 u t = au 3 1.1 (finite difference)............................. 3 1.2 u t = au.................................. 3 1.3 Von Neumann............... 5 1.4 Von Neumann............... 6 1.5............................

More information

Express5800/140Hb (2002/01/22)

Express5800/140Hb (2002/01/22) (2002/01/22) 1. N8100-592B N8100-594B N8100-681 ( -X/700(1)) ( -X/700(2)) ( -X/900(2)) CPU L1 Pentium Xeon (700MHz) 1 4 Pentium Xeon (700MHz) 1 4 32KB Pentium Xeon (900MHz) 1 4 L2 1MB 2MB 2MB CD-ROM LAN

More information

<4D F736F F F696E74202D2091E63489F15F436F6D C982E682E992B48D8291AC92B489B F090CD2888F38DFC E B8CDD8

<4D F736F F F696E74202D2091E63489F15F436F6D C982E682E992B48D8291AC92B489B F090CD2888F38DFC E B8CDD8 Web キャンパス資料 超音波シミュレーションの基礎 ~ 第 4 回 ComWAVEによる超高速超音波解析 ~ 科学システム開発部 Copyright (c)2006 ITOCHU Techno-Solutions Corporation 本日の説明内容 ComWAVEの概要および特徴 GPGPUとは GPGPUによる解析事例 CAE POWER 超音波研究会開催 (10 月 3 日 ) のご紹介

More information

? FPGA FPGA FPGA : : : ? ( ) (FFT) ( ) (Localization) ? : 0. 1 2 3 0. 4 5 6 7 3 8 6 1 5 4 9 2 0. 0 5 6 0 8 8 ( ) ? : LU Ax = b LU : Ax = 211 410 221 x 1 x 2 x 3 = 1 0 0 21 1 2 1 0 0 1 2 x = LUx = b 1 31

More information

Logitec NAS シリーズ ソフトウェアマニュアル

Logitec NAS シリーズ ソフトウェアマニュアル LAS-SFB V05 LAS-RAN LAS-MRN LHD-NAS ... 4... 4... 7... 8 1... 10... 11 1... 12 Windows... 12 Macintosh... 15 2IP... 16 IP Windows... 16 IP Macintosh... 19... 23... 29... 30 Windows Me2000 SMB... 30 Windows

More information

HP Workstation 総合カタログ

HP Workstation 総合カタログ HP Workstation E5 v2 Z Z SFF E5 v2 2 HP Windows Z 3 Performance Innovation Reliability 3 HPZ HP HP Z820 Workstation P.11 HP Z620 Workstation & CPU P.12 HP Z420 Workstation P.13 17.3in WIDE HP ZBook 17

More information

Express5800/120Lc

Express5800/120Lc Workgroup/Department 1. N8500-371 CPU L1 L2 CD-ROM LAN OS OS (/450(512)) N8500-372 N8500-373 N8500-400 (/450(512)-25AWS) (/500(512)) (/450(512)-25AWE) StarOffice Exchange Pentium450MHz1 2 ( 72GB) 32KB

More information

IPSJ SIG Technical Report Vol.2013-HPC-138 No /2/21 GPU CRS 1,a) 2,b) SpMV GPU CRS SpMV GPU NVIDIA Kepler CUDA5.0 Fermi GPU Kepler Kepler Tesla

IPSJ SIG Technical Report Vol.2013-HPC-138 No /2/21 GPU CRS 1,a) 2,b) SpMV GPU CRS SpMV GPU NVIDIA Kepler CUDA5.0 Fermi GPU Kepler Kepler Tesla GPU CRS 1,a),b) SpMV GPU CRS SpMV GPU NVIDIA Kepler CUDA5.0 Fermi GPU Kepler Kepler Tesla K0 CUDA5.0 cusparse CRS SpMV 00 1.86 177 1. SpMV SpMV CRS Compressed Row Storage *1 SpMV GPU GPU NVIDIA Kepler

More information

Real AdaBoost HOG 2009 3 A Graduation Thesis of College of Engineering, Chubu University Efficient Reducing Method of HOG Features for Human Detection based on Real AdaBoost Chika Matsushima ITS Graphics

More information

,., ping - RTT,., [2],RTT TCP [3] [4] Android.Android,.,,. LAN ACK. [5].. 3., 1.,. 3 AI.,,Amazon, (NN),, 1..NN,, (RNN) RNN

,., ping - RTT,., [2],RTT TCP [3] [4] Android.Android,.,,. LAN ACK. [5].. 3., 1.,. 3 AI.,,Amazon, (NN),, 1..NN,, (RNN) RNN DEIM Forum 2018 F1-1 LAN LSTM 112 8610 2-1-1 163-8677 1-24-2 E-mail: aoi@ogl.is.ocha.ac.jp, oguchi@is.ocha.ac.jp, sane@cc.kogakuin.ac.jp,,.,,., LAN,. Android LAN,. LSTM LAN., LSTM, Analysis of Packet of

More information

1 M32R Single-Chip Multiprocessor [2] [3] [4] [5] Linux/M32R UP(Uni-processor) SMP(Symmetric Multi-processor) MMU CPU nommu Linux/M32R Linux/M32R 2. M

1 M32R Single-Chip Multiprocessor [2] [3] [4] [5] Linux/M32R UP(Uni-processor) SMP(Symmetric Multi-processor) MMU CPU nommu Linux/M32R Linux/M32R 2. M M32R Linux SMP a) Implementation of Linux SMP kernel for M32R multiprocessor Hayato FUJIWARA a), Hitoshi YAMAMOTO, Hirokazu TAKATA, Kei SAKAMOTO, Mamoru SAKUGAWA, and Hiroyuki KONDO CPU OS 32 RISC M32R

More information

WebGL OpenGL GLSL Kageyama (Kobe Univ.) Visualization / 57

WebGL OpenGL GLSL Kageyama (Kobe Univ.) Visualization / 57 WebGL 2014.04.15 X021 2014 3 1F Kageyama (Kobe Univ.) Visualization 2014.04.15 1 / 57 WebGL OpenGL GLSL Kageyama (Kobe Univ.) Visualization 2014.04.15 2 / 57 WebGL Kageyama (Kobe Univ.) Visualization 2014.04.15

More information