GRAPE GRAPE-DR V-GRAPE

Size: px
Start display at page:

Download "GRAPE GRAPE-DR V-GRAPE"

Transcription

1 GRAPE-DR / 2006/11/20-22

2 GRAPE GRAPE-DR V-GRAPE

3

4

5

6

7 ( ) SDSS

8 Genzel et al 2003 Adaptive Optics SgrA ( )

9 12 1

10 :

11 GRAPE : (Barnes-Hut tree, FMM, Particle- Mesh Ewald(PPPM)...): ( )

12 1988

13 GRAPE-1(1989) Mflops

14 GRAPE-2(1990) 8 ( ) 40Mflops

15 GRAPE-3(1991) MHz 7.2Gflops

16 GRAPE-3 1µm MHz 600 Mflops

17 GRAPE-4(1995) Tflops

18 GRAPE-4 Xi Xi sqrt Pcut Fcut Xi Xi m/r FiFiPi m j Xi Xi r 2 Xi Xi Func. eval. Xi Xi Xi Xi Xi Xi Xi Xi m/r 3 Xi Xi Xi FiFiFi Xj Xi Xi Vi Xi Xi r. v m/r 5 Xi Xi Vj Xi Xi Xi Xi FiFiJi Xi Xi Xi Xi 1µm 10 (40 ) 640Mflops

19 GRAPE-6(2002) Tflops

20 パイプライン LSI 0.25 µm ルール (東芝 TC-240, 1.8M ゲート) 90 MHz 動作 6 パイプラインを集積 チップあたり 31 Gflops

21 2006 GRAPE-6 Core 2 Extreme 250nm 65nm 90MHz 2.93GHz 32.4Gflops 23.44Gflops 10W 75W 1W 3.24Gflops Gflops

22 GRAPE-4

23 GRAPE-6 MDGRAPE-3 : MDGRAPE-4, 20Pflops@2010 MDGRAPE-3 GRAPE-DR

24 GRAPE-DR GRAPE : 2 Petaflops Tflops GRAPE : GRAPE

25 GRAPE ( ( N )) µm µm nm nm 10

26 1.

27 1. 2.

28

29 GRAPE-DR (3)

30 1

31 : ( ) 1. GRAPE SIMD

32 SIMD SIMD (Single Instruction Multiple Data): GRAPE

33 SIMD SIMD SSE MMX SIMD GRAPE-DR SIMD

34 SIMD Illiac IV, Goodyear MPP, ICL DAP, TMC CM-2, MASPAR MP-1 ALU REG MEM ALU REG MEM ALU REG MEM ALU REG MEM ALU REG MEM : : SIMD

35 SIMD Pentium III, R0 R1 R2 R3 R4 R5 R6 R7 W0 W1 W0 W1 W0 W1 W0 W1 W0 W1 W0 W1 W0 W1 W0 W1 W2 W3 W2 W3 W2 W3 W2 W3 W2 W3 W2 W3 W2 W3 W2 W3 ALU0 ALU1 ALU2 ALU3 1 : 4

36 nyo d4prqts B8C*DFEHGFI 7KJ GRAPE-DR SIMD!"$# %'& (*)+,-. /0!"$#%ˆ $Š 'ŒŽ (* & ) \Y]_^[`baTced 1$243$5687*9 (FPGA :';$< ) RTSVUTWYX[Z yz{z z} ~ $ƒ Q 0 w4xzyz{ L$M4N'OQP SING u Xtv (PE) 1 PE = + ( ) (PE ) PE (BB)

37 *,+ (M) PE PEID BBID A x + "! B T 32W 256W ALU 256 # $ % & (' #)$ & (' (K M )

38

39 32PE( ) 16 18mm

40 GRAPE-DR 500MHz 100 Gflops ( )

41 GRAPE-DR 別ボード こっちが プロジェ クト公式 中身は殆ど同じ 何故か大きい LINPACK が動作 したらしい

42 GRAPE

43 : g i = j f(x i, x j ) i j j i j, i j ( )

44 ( 2006) /VARI xi, yi, zi, e2; /VARJ xj, yj, zj, mj; /VARF fx, fy, fz; dx = xi - xj; dy = yi - yj; dz = zi - zj; r2 = dx*dx + dy*dy + dz*dz + e2; r3i= powm32(r2); ff = mj*r3i; fx += ff*dx; fy += ff*dy; fz += ff*dz; GRAPE PGR (FPGA PROGRAPE D 2006)

45 / int SING_send_j_particle(struct grape_j_particle_struct *jp, int index_in_em); int SING_send_i_particle(struct grape_i_particle_struct *ip, int n); int SING_get_result(struct grape_result_struct *rp); void SING_grape_init(); int SING_grape_run(int n);

46 2 ( )

47 V-GRAPE GRAPE-DR = V-GRAPE

48 GRAPE-DR 256Gflops MDGRAPE-3 FPGA FFT CG 2

49 FFT CG :

50 FFT FFT FFT : 10 log n 4GB/s 10 Gflops CPU

51 CG : O(10)

52 GRAPE-DR: 1MB Intel Itanium : 24MB? DRAM : 32 MB

53 V-GRAPE PE PE PE PE PE PE PE PE GRAPE-DR V-GRAPE PE

54 V-GRAPE / : ( ) :

55 : PIC

56 PIC Charge assignment Charge assignment: GRAPE- DR :

57 Charge 1 : 50 ( ) 1 : 12 ( ) : 1 4 GRAPE-DR : 2 GB/s 8 Gflops

58 GRAPE-DR GPGPU V-GRAPE

59 GPGPU nvidia 8800: C 768MB 90GB/s(SX-7 3 ) GPU C 400Gflops 1 (8 )

60 V-GRAPE 128MB GB/s 1.5Tflops (50 )

61 : = 3 10 GRAPE-DR 100Gflops

62 GRAPE LSI GRAPE-DR SIMD GRAPE V-GRAPE PIC GRAPE-DR GPGPU V-GRAPE

63

64 Memory Wall : : : :

65 1990 I/O

66

67 : 30

68 V-GRAPE BLAS, LAPACK PE PGDL ( FPGA ) SPH ( 150)

69 :

70 (M. Flynn) SISD/SIMD/MISD/MIMD (SI) (MI) (SD) (MD) SIMD SIMD ( ) MIMD

71 SIMD GRAPE ( ) : : ( ) : 1000 ( / )

72 (PE) (j- ) j- j- j- i- PE PE PE PE PE i- PE PE PE PE PE i- PE PE PE PE PE i- PE PE PE PE PE i- PE PE PE PE PE i- PE PE PE PE PE j- j- (GRAPE-6 ) 2 : 2

73 PE PE PE PE PE PE broadcast memory PE PE PE PE broadcast memory PE PE PE PE broadcast memory PE PE PE PE broadcast memory ( ) Memory controller/host

74 SING: Sing Is Not GRAPE DRAM DRAM DRAM DRAM FPGA CP SING FPGA CP SING FPGA CP SING FPGA CP SING FPGA Host interface PCI-X/PCIE PCI

75 GRAPE : SIMD GDR : (FPGA ) =

76 PE PE ( )

77 var vector long xi hlt flt64to72 var vector long yi hlt flt64to72 var vector long zi hlt flt64to72 var vector short idxi hlt fix32to36ru bvar long xj elt flt64to72 bvar long yj elt flt64to72 bvar long zj elt flt64to72 bvar long vxj xj bvar short mj elt flt64to36 bvar short eps2 elt flt64to36 bvar short idxj elt fix32to36ru var short lmj var short leps2 var short lidxj var vector long accx rrn flt72to64 fadd var vector long accy rrn flt72to64 fadd var vector long accz rrn flt72to64 fadd var vector long pot rrn flt72to64 fadd hlt, elt, rrn

78 loop initialization vlen 4 uxor $t $t $t upassa $ti $ti $lr40v upassa $t $t $lr48v upassa $t $t $lr56v upassa $t $t pot loop body vlen 3 bm vxj $lr0v vlen 1 bm mj lmj bm eps2 leps2 bm idxj lidxj ( ) ( ) ( )

79 vlen 4 nop upassa idxi idxi $t uxor $ti lidxj $t moi 2 ( ) ulnot $ti $ti $t # mreg 1 indicates i!= j moi 0 nop fsub $lr0 xi $r6v $t fsub $lr2 yi $r10v ; fmul $ti $ti $t fsub $lr4 zi $r14v fmul $r10v $r10v $r18v ; fadd $t leps2 $t fmul $r14v $r14v ; fadd $fb $ti $t fadd $fb $ti $r18v $t # rsq is now in r18 t, dx, dy,dz are in 2

80 ( ) ulsr $ti il"60" $t $lr22v ulsr $ti il"1" $t uadd $ti $lr22v $t usub hl"9fd" $ti $t # $lr8v 1.5 ulsl $ti il"60" $lr30v moi 1 uand il"1" $lr22v moi 0 uand $r18v h"000ffffff" $t uor $ti h"3ff000000" $t fmul $ti f"0.57" $t fsub f"1.57" $ti $t mi 1 fmul f"1.414" $ti $t mi 0 nop fmul $t $lr30v $t $r22v # Here the result is the initial guess r 3 1

81 ( ) fmul $r18v $r18v $r26v $t fmul $r18v $ti $r26v $t fmul $ti f"0.5" $r26v # r26v is a**3/2 fmul $r22v $r22v $t fmul $ti $r26v $t fsub f"1.5" $ti $t fmul $r22v $ti $t $r22v fmul $ti $ti $t fmul $ti $r26v $t ( ) fsub f"1.5" $ti $t fmul $r22v $ti $t $r22v fmul $ti $ti $t fmul $ti $r26v $t fsub f"0.5" $ti $t fmul $r22v $ti $t fadd $r22v $ti $t fmul lmj $ti $t $r22v

82 ( ) mi 2 fmul $r6v $ti ; upassa pot pot $lr0v fmul $r10v $t ; fadd $fb $lr40v $lr40v accx fmul $r14v $t ; fadd $fb $lr48v $lr48v accy fmul $r18v $t ; fadd $fb $lr56v $lr56v accz fadd $fb $lr0v pot

83 int SING_send_j_particle(struct grape_j_particle_struct *jp, int index_in_em); int SING_send_i_particle(struct grape_i_particle_struct *ip, int n); int SING_get_result(struct grape_result_struct *rp); void SING_grape_init(); int SING_grape_run(int n); GRAPE-3/5

84 struct grape_j_particle_struct{ double xj; double yj; double zj; double mj; double eps2; UINT32 idxj; }; struct grape_i_particle_struct{ double xi; double yi; double zi; UINT32 idxi; }; struct grape_result_struct{ double accx; double accy; double accz; double pot; };

85 17mm

86

87 PE

GRAPE GRAPE-DR V-GRAPE

GRAPE GRAPE-DR V-GRAPE V-GRAPE / CCSR 2007/1/24 GRAPE GRAPE-DR V-GRAPE http://antwrp.gsfc.nasa.gov/apod/ap950917.html ( ) SDSS GRAPE : (Barnes-Hut tree, FMM, Particle- Mesh Ewald(PPPM)...): ( ) 1988 GRAPE-1(1989) 16 8 32

More information

HPC / (CfCA) HPC 2007/11/23-25

HPC / (CfCA) HPC 2007/11/23-25 HPC / (CfCA) HPC 2007/11/23-25 CfCA GRAPE GRAPE GRAPE-DR HPC : : 1 1 (II ) Ia 100 1 ( ) 0.1 pc 1 AU 3 : 1 100 Top-down Katz and Gunn 1992 Dark Matter + + DM, : :SPH 10 4 Cray YMP 500-1000 : 10 7 Saitoh

More information

GRAPE-DR /

GRAPE-DR / GRAPE-DR / GRAPE GRAPE-DR GRAPE ( ): (Barnes-Hut tree, FMM, Particle- Mesh Ewald(PPPM)...): ( ) 1988 32 IC 200 0.1m 3 400 GRAPE-1(1989) 16 8 32 48 240Mflops GRAPE-2(1990) 8 ( ) 40Mflops GRAPE-3(1991) 24

More information

Agenda GRAPE-MPの紹介と性能評価 GRAPE-MPの概要 OpenCLによる四倍精度演算 (preliminary) 4倍精度演算用SIM 加速ボード 6 processor elem with 128 bit logic Peak: 1.2Gflops

Agenda GRAPE-MPの紹介と性能評価 GRAPE-MPの概要 OpenCLによる四倍精度演算 (preliminary) 4倍精度演算用SIM 加速ボード 6 processor elem with 128 bit logic Peak: 1.2Gflops Agenda GRAPE-MPの紹介と性能評価 GRAPE-MPの概要 OpenCLによる四倍精度演算 (preliminary) 4倍精度演算用SIM 加速ボード 6 processor elem with 128 bit logic Peak: 1.2Gflops ボードの概要 Control processor (FPGA by Altera) GRAPE-MP chip[nextreme

More information

: 50 10 10 1. : : 3 : 4 : 2 2. : 1946 1975 1 : load: store: : : ( ) ( ) : 101 x 101 ------------- 101 101 ------------ 11001 2 ( ): 32 32 1 32 : 32 ( ) 32 ( ) : log 2 32 : : ( F) ( D) E W 1 4 : F D E

More information

ストリーミング SIMD 拡張命令2 (SSE2) を使用した SAXPY/DAXPY

ストリーミング SIMD 拡張命令2 (SSE2) を使用した SAXPY/DAXPY SIMD 2(SSE2) SAXPY/DAXPY 2.0 2000 7 : 248600J-001 01/12/06 1 305-8603 115 Fax: 0120-47-8832 * Copyright Intel Corporation 1999, 2000 01/12/06 2 1...5 2 SAXPY DAXPY...5 2.1 SAXPY DAXPY...6 2.1.1 SIMD C++...6

More information

untitled

untitled taisuke@cs.tsukuba.ac.jp http://www.hpcs.is.tsukuba.ac.jp/~taisuke/ CP-PACS HPC PC post CP-PACS CP-PACS II 1990 HPC RWCP, HPC かつての世界最高速計算機も 1996年11月のTOP500 第一位 ピーク性能 614 GFLOPS Linpack性能 368 GFLOPS (地球シミュレータの前

More information

並列計算の数理とアルゴリズム サンプルページ この本の定価 判型などは, 以下の URL からご覧いただけます. このサンプルページの内容は, 初版 1 刷発行時のものです.

並列計算の数理とアルゴリズム サンプルページ この本の定価 判型などは, 以下の URL からご覧いただけます.  このサンプルページの内容は, 初版 1 刷発行時のものです. 並列計算の数理とアルゴリズム サンプルページ この本の定価 判型などは, 以下の URL からご覧いただけます. http://www.morikita.co.jp/books/mid/080711 このサンプルページの内容は, 初版 1 刷発行時のものです. Calcul scientifique parallèle by Frédéric Magoulès and François-Xavier

More information

A 99% MS-Free Presentation

A 99% MS-Free Presentation A 99% MS-Free Presentation 2 Galactic Dynamics (Binney & Tremaine 1987, 2008) Dynamics of Galaxies (Bertin 2000) Dynamical Evolution of Globular Clusters (Spitzer 1987) The Gravitational Million-Body Problem

More information

アクセラレータのデモと プログラミング手法

アクセラレータのデモと プログラミング手法 アクセラレータのデモと プログラミング手法 会津大学中里直人 アクセラレータボードを使った高速化スクール 2009/12/07 アクセラレータとは (1) ホスト計算機を補佐して特定の計算を高速化する計算機デバイス ホスト (CPU) で動作するプログラムを補佐 アクセラレータの例 Cell/PowerXCell8iブレード ボード : 計算 GPU ボード (NVIDIA, AMD, S3) :

More information

スパコンに通じる並列プログラミングの基礎

スパコンに通じる並列プログラミングの基礎 2016.06.06 2016.06.06 1 / 60 2016.06.06 2 / 60 Windows, Mac Unix 0444-J 2016.06.06 3 / 60 Part I Unix GUI CUI: Unix, Windows, Mac OS Part II 0444-J 2016.06.06 4 / 60 ( : ) 6 6 ( ) 6 10 6 16 SX-ACE 6 17

More information

スパコンに通じる並列プログラミングの基礎

スパコンに通じる並列プログラミングの基礎 2018.06.04 2018.06.04 1 / 62 2018.06.04 2 / 62 Windows, Mac Unix 0444-J 2018.06.04 3 / 62 Part I Unix GUI CUI: Unix, Windows, Mac OS Part II 2018.06.04 4 / 62 0444-J ( : ) 6 4 ( ) 6 5 * 6 19 SX-ACE * 6

More information

( )

( ) 1. 2. 3. 4. 5. ( ) () http://www-astro.physics.ox.ac.uk/~wjs/apm_grey.gif http://antwrp.gsfc.nasa.gov/apod/ap950917.html ( ) SDSS : d 2 r i dt 2 = Gm jr ij j i rij 3 = Newton 3 0.1% 19 20 20 2 ( ) 3 3

More information

スパコンに通じる並列プログラミングの基礎

スパコンに通じる並列プログラミングの基礎 2018.09.10 furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 1 / 59 furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 2 / 59 Windows, Mac Unix 0444-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 3 / 59 Part I Unix GUI CUI:

More information

supercomputer2010.ppt

supercomputer2010.ppt nanri@cc.kyushu-u.ac.jp 1 !! : 11 12! : nanri@cc.kyushu-u.ac.jp! : Word 2 ! PC GPU) 1997 7 http://wiredvision.jp/news/200806/2008062322.html 3 !! (Cell, GPU )! 4 ! etc...! 5 !! etc. 6 !! 20km 40 km ) 340km

More information

untitled

untitled A = QΛQ T A n n Λ Q A = XΛX 1 A n n Λ X GPGPU A 3 T Q T AQ = T (Q: ) T u i = λ i u i T {λ i } {u i } QR MR 3 v i = Q u i A {v i } A n = 9000 Quad Core Xeon 2 LAPACK (4/3) n 3 O(n 2 ) O(n 3 ) A {v i }

More information

untitled

untitled A = QΛQ T A n n Λ Q A = XΛX 1 A n n Λ X GPGPU A 3 T Q T AQ = T (Q: ) T u i = λ i u i T {λ i } {u i } QR MR 3 v i = Q u i A {v i } A n = 9000 Quad Core Xeon 2 LAPACK (4/3) n 3 O(n 2 ) O(n 3 ) A {v i }

More information

EGunGPU

EGunGPU Super Computing in Accelerator simulations - Electron Gun simulation using GPGPU - K. Ohmi, KEK-Accel Accelerator Physics seminar 2009.11.19 Super computers in KEK HITACHI SR11000 POWER5 16 24GB 16 134GFlops,

More information

2005 1

2005 1 25 SPARCstation 2 CPU central processor unit 25 2 25 3 25 4 DRAM 25 5 25 6 : DRAM 25 7 2 25 8 2 25 9 2 bit: binary digit V 2V 25 2 2 2 2 4 5 2 6 3 7 25 A B C A B C A B C A B C A C A B 3 25 2 25 3 Co Cin

More information

マルチコアPCクラスタ環境におけるBDD法のハイブリッド並列実装

マルチコアPCクラスタ環境におけるBDD法のハイブリッド並列実装 2010 GPGPU 2010 9 29 MPI/Pthread (DDM) DDM CPU CPU CPU CPU FEM GPU FEM CPU Mult - NUMA Multprocessng Cell GPU Accelerator, GPU CPU Heterogeneous computng L3 cache L3 cache CPU CPU + GPU GPU L3 cache 4

More information

CPU Levels in the memory hierarchy Level 1 Level 2... Increasing distance from the CPU in access time Level n Size of the memory at each level 1: 2.2

CPU Levels in the memory hierarchy Level 1 Level 2... Increasing distance from the CPU in access time Level n Size of the memory at each level 1: 2.2 FFT 1 Fourier fast Fourier transform FFT FFT FFT 1 FFT FFT 2 Fourier 2.1 Fourier FFT Fourier discrete Fourier transform DFT DFT n 1 y k = j=0 x j ω jk n, 0 k n 1 (1) x j y k ω n = e 2πi/n i = 1 (1) n DFT

More information

untitled

untitled Power Wall HPL1 10 B/F EXTREMETECH Supercomputing director bets $2,000 that we won t have exascale computing by 2020 One of the biggest problems standing in our way is power. [] http://www.extremetech.com/computing/155941

More information

The 3 key challenges in programming for MC

The 3 key challenges in programming for MC Aug 3 06 Software &Solutions group Intel Intel Centrino Intel NetBurst Intel XScale Itanium Pentium Xeon Intel Core VTune Intel Corporation Intel NetBurst Pentium Xeon Pentium M Core 64 2 Intel Software

More information

iphone GPGPU GPU OpenCL Mac OS X Snow LeopardOpenCL iphone OpenCL OpenCL NVIDIA GPU CUDA GPU GPU GPU 15 GPU GPU CPU GPU iii OpenMP MPI CPU OpenCL CUDA OpenCL CPU OpenCL GPU NVIDIA Fermi GPU Fermi GPU GPU

More information

AMD/ATI Radeon HD 5870 GPU DEGIMA LINPACK HD 5870 GPU DEGIMA LINPACK GFlops/Watt GFlops/Watt Abstract GPU Computing has lately attracted

AMD/ATI Radeon HD 5870 GPU DEGIMA LINPACK HD 5870 GPU DEGIMA LINPACK GFlops/Watt GFlops/Watt Abstract GPU Computing has lately attracted DEGIMA LINPACK Energy Performance for LINPACK Benchmark on DEGIMA 1 AMD/ATI Radeon HD 5870 GPU DEGIMA LINPACK HD 5870 GPU DEGIMA LINPACK 1.4698 GFlops/Watt 1.9658 GFlops/Watt Abstract GPU Computing has

More information

e Ž ¹ vµ q ¹¹¹ ¹¹¹¹¹ vµ j ¹¹¹ ¹¹¹¹ r µ ¹¹¹¹ ¹¹¹¹¹ µ ¹¹¹¹¹ ¹¹¹¹ µ ¹¹¹¹ ¹¹¹ vµ ¹¹¹¹ ¹¹¹¹ vµ Ž ¹¹¹ ¹¹¹¹ vµˆ ¹¹¹ ¹¹¹¹¹ µ ¹¹¹¹ ¹¹¹¹¹¹¹¹ µ ¹¹¹¹¹ ¹¹¹

e Ž ¹ vµ q ¹¹¹ ¹¹¹¹¹ vµ j ¹¹¹ ¹¹¹¹ r µ ¹¹¹¹ ¹¹¹¹¹ µ ¹¹¹¹¹ ¹¹¹¹ µ ¹¹¹¹ ¹¹¹ vµ ¹¹¹¹ ¹¹¹¹ vµ Ž ¹¹¹ ¹¹¹¹ vµˆ ¹¹¹ ¹¹¹¹¹ µ ¹¹¹¹ ¹¹¹¹¹¹¹¹ µ ¹¹¹¹¹ ¹¹¹ e Ž µ ¹¹¹ ¹¹¹ v µ ¹¹¹¹¹ ¹¹¹¹¹¹ rµ ¹¹¹¹ ¹¹¹ j µ r µž ¹¹¹¹¹ ¹¹¹¹ µ ¹¹¹ ¹¹¹¹ µ ¹¹¹¹ ¹¹¹¹ µ ¹¹¹¹¹ µ ¹¹¹¹¹¹ ¹¹¹¹¹ l vµ u ¹¹¹ ¹¹¹¹¹¹ µ ¹¹¹¹ ¹¹¹¹¹ µ µ ¹¹¹ ¹¹¹ µg ¹¹¹¹ ¹¹¹¹¹ r µ Ž ¹¹¹ ¹¹¹ vµ ¹¹¹¹ ¹¹¹¹ µ ¹¹¹¹¹

More information

ohpr.dvi

ohpr.dvi 2003-08-04 1984 VP-1001 CPU, 250 MFLOPS, 128 MB 2004ASCI Purple (LLNL)64 CPU 197, 100 TFLOPS, 50 TB, 4.5 MW PC 2 CPU 16, 4 GFLOPS, 32 GB, 3.2 kw 20028 CPU 640, 40 TFLOPS, 10 TB, 10 MW (ASCI: Accelerated

More information

( : December 27, 2015) CONTENTS I. 1 II. 2 III. 2 IV. 3 V. 5 VI. 6 VII. 7 VIII. 9 I. 1 f(x) f (x) y = f(x) x ϕ(r) (gradient) ϕ(r) (gradϕ(r) ) ( ) ϕ(r)

( : December 27, 2015) CONTENTS I. 1 II. 2 III. 2 IV. 3 V. 5 VI. 6 VII. 7 VIII. 9 I. 1 f(x) f (x) y = f(x) x ϕ(r) (gradient) ϕ(r) (gradϕ(r) ) ( ) ϕ(r) ( : December 27, 215 CONTENTS I. 1 II. 2 III. 2 IV. 3 V. 5 VI. 6 VII. 7 VIII. 9 I. 1 f(x f (x y f(x x ϕ(r (gradient ϕ(r (gradϕ(r ( ϕ(r r ϕ r xi + yj + zk ϕ(r ϕ(r x i + ϕ(r y j + ϕ(r z k (1.1 ϕ(r ϕ(r i

More information

untitled

untitled PC murakami@cc.kyushu-u.ac.jp muscle server blade server PC PC + EHPC/Eric (Embedded HPC with Eric) 1216 Compact PCI Compact PCIPC Compact PCISH-4 Compact PCISH-4 Eric Eric EHPC/Eric EHPC/Eric Gigabit

More information

Slides: TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments

Slides: TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments 計算機アーキテクチャ第 11 回 マルチプロセッサ 本資料は授業用です 無断で転載することを禁じます 名古屋大学 大学院情報科学研究科 准教授加藤真平 デスクトップ ジョブレベル並列性 スーパーコンピュータ 並列処理プログラム プログラムの並列化 for (i = 0; i < N; i++) { x[i] = a[i] + b[i]; } プログラムの並列化 x[0] = a[0] + b[0];

More information

262014 3 1 1 6 3 2 198810 2/ 198810 2 1 3 4 http://www.pref.hiroshima.lg.jp/site/monjokan/ 1... 1... 1... 2... 2... 4... 5... 9... 9... 10... 10... 10... 10... 13 2... 13 3... 15... 15... 15... 16 4...

More information

次世代スーパーコンピュータのシステム構成案について

次世代スーパーコンピュータのシステム構成案について 6 19 4 27 1. 2. 3. 3.1 3.2 A 3.3 B 4. 5. 2007/4/27 4 1 1. 2007/4/27 4 2 NEC NHF2 18 9 19 19 2 28 10PFLOPS2.5PB 30MW 3,200 18 12 12 SimFold, GAMESS, Modylas, RSDFT, NICAM, LatticeQCD, LANS HPL, NPB-FT 19

More information

(Basic Theory of Information Processing) 1

(Basic Theory of Information Processing) 1 (Basic Theory of Information Processing) 1 10 (p.178) Java a[0] = 1; 1 a[4] = 7; i = 2; j = 8; a[i] = j; b[0][0] = 1; 2 b[2][3] = 10; b[i][j] = a[2] * 3; x = a[2]; a[2] = b[i][3] * x; 2 public class Array0

More information

II 2 II

II 2 II II 2 II 2005 yugami@cc.utsunomiya-u.ac.jp 2005 4 1 1 2 5 2.1.................................... 5 2.2................................. 6 2.3............................. 6 2.4.................................

More information

HPCマシンの変遷と 今後の情報基盤センターの役割

HPCマシンの変遷と 今後の情報基盤センターの役割 筑波大学計算科学センターシンポジウム 計算機アーキテクトが考える 次世代スパコン 2006 年 4 月 5 日 村上和彰 九州大学 murakami@cc.kyushu-u.ac.jp 次世代スパコン ~ 達成目標と制約条件の整理 ~ 達成目標 性能目標 (2011 年 ) LINPACK (HPL):10PFlop/s 実アプリケーション :1PFlop/s 成果目標 ( 私見 ) 科学技術計算能力の国際競争力の向上ならびに維持による我が国の科学技術力

More information

GPUを用いたN体計算

GPUを用いたN体計算 単精度 190Tflops GPU クラスタ ( 長崎大 ) の紹介 長崎大学工学部超高速メニーコアコンピューティングセンターテニュアトラック助教濱田剛 1 概要 GPU (Graphics Processing Unit) について簡単に説明します. GPU クラスタが得意とする応用問題を議論し 長崎大学での GPU クラスタによる 取組方針 N 体計算の高速化に関する研究内容 を紹介します. まとめ

More information

main.dvi

main.dvi PC 1 1 [1][2] [3][4] ( ) GPU(Graphics Processing Unit) GPU PC GPU PC ( 2 GPU ) GPU Harris Corner Detector[5] CPU ( ) ( ) CPU GPU 2 3 GPU 4 5 6 7 1 toyohiro@isc.kyutech.ac.jp 45 2 ( ) CPU ( ) ( ) () 2.1

More information

1 GPU GPGPU GPU CPU 2 GPU 2007 NVIDIA GPGPU CUDA[3] GPGPU CUDA GPGPU CUDA GPGPU GPU GPU GPU Graphics Processing Unit LSI LSI CPU ( ) DRAM GPU LSI GPU

1 GPU GPGPU GPU CPU 2 GPU 2007 NVIDIA GPGPU CUDA[3] GPGPU CUDA GPGPU CUDA GPGPU GPU GPU GPU Graphics Processing Unit LSI LSI CPU ( ) DRAM GPU LSI GPU GPGPU (I) GPU GPGPU 1 GPU(Graphics Processing Unit) GPU GPGPU(General-Purpose computing on GPUs) GPU GPGPU GPU ( PC ) PC PC GPU PC PC GPU GPU 2008 TSUBAME NVIDIA GPU(Tesla S1070) TOP500 29 [1] 2009 AMD

More information

FIT2013( 第 12 回情報科学技術フォーラム ) C-017 SIMD Implementation and evaluation of a morphological pattern spectrum using an highly-parallel SIMD matrix process

FIT2013( 第 12 回情報科学技術フォーラム ) C-017 SIMD Implementation and evaluation of a morphological pattern spectrum using an highly-parallel SIMD matrix process C-017 SIMD Implementation an evaluation of a morphological pattern pectrum uing an highly-parallel SIMD matrix proceor Yauhi Tukaa Tomohiro Takea Tohiya Hona Takehi Kumaki Takehi Ogura Takehi Fujino 1.

More information

スライド 1

スライド 1 swk(at)ic.is.tohoku.ac.jp 2 Outline 3 ? 4 S/N CCD 5 Q Q V 6 CMOS 1 7 1 2 N 1 2 N 8 CCD: CMOS: 9 : / 10 A-D A D C A D C A D C A D C A D C A D C ADC 11 A-D ADC ADC ADC ADC ADC ADC ADC ADC ADC A-D 12 ADC

More information

: , 2.0, 3.0, 2.0, (%) ( 2.

: , 2.0, 3.0, 2.0, (%) ( 2. 2017 1 2 1.1...................................... 2 1.2......................................... 4 1.3........................................... 10 1.4................................. 14 1.5..........................................

More information

Itanium2ベンチマーク

Itanium2ベンチマーク HPC CPU mhori@ile.osaka-u.ac.jp Special thanks Timur Esirkepov HPC 2004 2 25 1 1. CPU 2. 3. Itanium 2 HPC 2 1 Itanium2 CPU CPU 3 ( ) Intel Itanium2 NEC SX-6 HP Alpha Server ES40 PRIMEPOWER SR8000 Intel

More information

1重谷.PDF

1重谷.PDF RSCC RSCC RSCC BMT 1 6 3 3000 3000 200310 1994 19942 VPP500/32PE 19992 VPP700E/128PE 160PE 20043 2 2 PC Linux 2048 CPU Intel Xeon 3.06GHzDual) 12.5 TFLOPS SX-7 32CPU/256GB 282.5 GFLOPS Linux 3 PC 1999

More information

( ) X x, y x y x y X x X x [x] ( ) x X y x y [x] = [y] ( ) x X y y x ( ˆX) X ˆX X x x z x X x ˆX [z x ] X ˆX X ˆX ( ˆX ) (0) X x, y d(x(1), y(1)), d(x

( ) X x, y x y x y X x X x [x] ( ) x X y x y [x] = [y] ( ) x X y y x ( ˆX) X ˆX X x x z x X x ˆX [z x ] X ˆX X ˆX ( ˆX ) (0) X x, y d(x(1), y(1)), d(x Z Z Ẑ 1 1.1 (X, d) X x 1, x 2,, x n, x x n x(n) ( ) X x x ε N N i, j i, j d(x(i), x(j)) < ε ( ) X x x n N N i i d(x(n), x(i)) < 1 n ( ) X x lim n x(n) X x X () X x, y lim n d(x(n), y(n)) = 0 x y x y 1

More information

4

4 4 r r 43 44 a b c f d e a r b c d e f 45 r r r 46 47 a b g a b r c d e f r g c d e f e 48 mm r r 1 49 a r b c a b 1 1 a 3 a 50 1 a 3 1 mb a 1 mm 3 a a a 51 1 mm 1 mm 1 5 mb 3 4 1 3 4 1 53 1 1 mj r 1 a

More information

2/66

2/66 1/66 9 Outline 1. 2. 3. 4. CPU 5. Jun. 13, 2013@A 2/66 3/66 4/66 Network Memory Memory Memory CPU SIMD if Cache CPU Cache CPU Cache CPU 5/66 FPU FPU Floating Processing Unit Register Register Register

More information

07-二村幸孝・出口大輔.indd

07-二村幸孝・出口大輔.indd GPU Graphics Processing Units HPC High Performance Computing GPU GPGPU General-Purpose computation on GPU CPU GPU GPU *1 Intel Quad-Core Xeon E5472 3.0 GHz 2 6 MB L2 cache 1600 MHz FSB 80 GFlops 1 nvidia

More information

23 Fig. 2: hwmodulev2 3. Reconfigurable HPC 3.1 hw/sw hw/sw hw/sw FPGA PC FPGA PC FPGA HPC FPGA FPGA hw/sw hw/sw hw- Module FPGA hwmodule hw/sw FPGA h

23 Fig. 2: hwmodulev2 3. Reconfigurable HPC 3.1 hw/sw hw/sw hw/sw FPGA PC FPGA PC FPGA HPC FPGA FPGA hw/sw hw/sw hw- Module FPGA hwmodule hw/sw FPGA h 23 FPGA CUDA Performance Comparison of FPGA Array with CUDA on Poisson Equation (lijiang@sekine-lab.ei.tuat.ac.jp), (kazuki@sekine-lab.ei.tuat.ac.jp), (takahashi@sekine-lab.ei.tuat.ac.jp), (tamukoh@cc.tuat.ac.jp),

More information

Part () () Γ Part ,

Part () () Γ Part , Contents a 6 6 6 6 6 6 6 7 7. 8.. 8.. 8.3. 8 Part. 9. 9.. 9.. 3. 3.. 3.. 3 4. 5 4.. 5 4.. 9 4.3. 3 Part. 6 5. () 6 5.. () 7 5.. 9 5.3. Γ 3 6. 3 6.. 3 6.. 3 6.3. 33 Part 3. 34 7. 34 7.. 34 7.. 34 8. 35

More information

°ÌÁê¿ô³ØII

°ÌÁê¿ô³ØII July 14, 2007 Brouwer f f(x) = x x f(z) = 0 2 f : S 2 R 2 f(x) = f( x) x S 2 3 3 2 - - - 1. X x X U(x) U(x) x U = {U(x) x X} X 1. U(x) A U(x) x 2. A U(x), A B B U(x) 3. A, B U(x) A B U(x) 4. A U(x),

More information

統計学のポイント整理

統計学のポイント整理 .. September 17, 2012 1 / 55 n! = n (n 1) (n 2) 1 0! = 1 10! = 10 9 8 1 = 3628800 n k np k np k = n! (n k)! (1) 5 3 5 P 3 = 5! = 5 4 3 = 60 (5 3)! n k n C k nc k = npk k! = n! k!(n k)! (2) 5 3 5C 3 = 5!

More information

XACCの概要

XACCの概要 2 global void kernel(int a[max], int llimit, int ulimit) {... } : int main(int argc, char *argv[]){ MPI_Int(&argc, &argc); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); dx

More information

rank ”«‘‚“™z‡Ì GPU ‡É‡æ‡éŁÀŠñ›»

rank ”«‘‚“™z‡Ì GPU ‡É‡æ‡éŁÀŠñ›» rank GPU ERATO 2011 11 1 1 / 26 GPU rank/select wavelet tree balanced parenthesis GPU rank 2 / 26 GPU rank/select wavelet tree balanced parenthesis GPU rank 2 / 26 GPU rank/select wavelet tree balanced

More information

i

i 14 i ii iii iv v vi 14 13 86 13 12 28 14 16 14 15 31 (1) 13 12 28 20 (2) (3) 2 (4) (5) 14 14 50 48 3 11 11 22 14 15 10 14 20 21 20 (1) 14 (2) 14 4 (3) (4) (5) 12 12 (6) 14 15 5 6 7 8 9 10 7

More information

..3. Ω, Ω F, P Ω, F, P ). ) F a) A, A,..., A i,... F A i F. b) A F A c F c) Ω F. ) A F A P A),. a) 0 P A) b) P Ω) c) [ ] A, A,..., A i,... F i j A i A

..3. Ω, Ω F, P Ω, F, P ). ) F a) A, A,..., A i,... F A i F. b) A F A c F c) Ω F. ) A F A P A),. a) 0 P A) b) P Ω) c) [ ] A, A,..., A i,... F i j A i A .. Laplace ). A... i),. ω i i ). {ω,..., ω } Ω,. ii) Ω. Ω. A ) r, A P A) P A) r... ).. Ω {,, 3, 4, 5, 6}. i i 6). A {, 4, 6} P A) P A) 3 6. ).. i, j i, j) ) Ω {i, j) i 6, j 6}., 36. A. A {i, j) i j }.

More information

1009.\1.\4.ai

1009.\1.\4.ai - 1 - E O O O O O O - 2 - E O O O - 3 - O N N N N N N N N N N N N N N N N N N N N N N N E e N N N N N N N N N N N N N N N N N N N N N N N D O O O - 4 - O O O O O O O O N N N N N N N N N N N N N N N N N

More information

1 4 1.1........................................... 4 1.2.................................. 4 1.3................................... 4 2 5 2.1 GPU.....

1 4 1.1........................................... 4 1.2.................................. 4 1.3................................... 4 2 5 2.1 GPU..... CPU GPU N Q07-065 2011 2 17 1 1 4 1.1........................................... 4 1.2.................................. 4 1.3................................... 4 2 5 2.1 GPU...........................................

More information

2 Chapter 4 (f4a). 2. (f4cone) ( θ) () g M. 2. (f4b) T M L P a θ (f4eki) ρ H A a g. v ( ) 2. H(t) ( )

2 Chapter 4 (f4a). 2. (f4cone) ( θ) () g M. 2. (f4b) T M L P a θ (f4eki) ρ H A a g. v ( ) 2. H(t) ( ) http://astr-www.kj.yamagata-u.ac.jp/~shibata f4a f4b 2 f4cone f4eki f4end 4 f5meanfp f6coin () f6a f7a f7b f7d f8a f8b f9a f9b f9c f9kep f0a f0bt version feqmo fvec4 fvec fvec6 fvec2 fvec3 f3a (-D) f3b

More information

1. A0 A B A0 A : A1,...,A5 B : B1,...,B

1. A0 A B A0 A : A1,...,A5 B : B1,...,B 1. A0 A B A0 A : A1,...,A5 B : B1,...,B12 2. 3. 4. 5. A0 A, B Z Z m, n Z m n m, n A m, n B m=n (1) A, B (2) A B = A B = Z/ π : Z Z/ (3) A B Z/ (4) Z/ A, B (5) f : Z Z f(n) = n f = g π g : Z/ Z A, B (6)

More information

smpp_resume.dvi

smpp_resume.dvi 6 mmiki@mail.doshisha.ac.jp Parallel Processing Parallel Pseudo-parallel Concurrent 1) 1/60 1) 1997 5 11 IBM Deep Blue Deep Blue 2) PC 2000 167 Rank Manufacturer Computer Rmax Installation Site Country

More information

openmp1_Yaguchi_version_170530

openmp1_Yaguchi_version_170530 並列計算とは /OpenMP の初歩 (1) 今 の内容 なぜ並列計算が必要か? スーパーコンピュータの性能動向 1ExaFLOPS 次世代スハ コン 京 1PFLOPS 性能 1TFLOPS 1GFLOPS スカラー機ベクトル機ベクトル並列機並列機 X-MP ncube2 CRAY-1 S-810 SR8000 VPP500 CM-5 ASCI-5 ASCI-4 S3800 T3E-900 SR2201

More information

( )/2 hara/lectures/lectures-j.html 2, {H} {T } S = {H, T } {(H, H), (H, T )} {(H, T ), (T, T )} {(H, H), (T, T )} {1

( )/2   hara/lectures/lectures-j.html 2, {H} {T } S = {H, T } {(H, H), (H, T )} {(H, T ), (T, T )} {(H, H), (T, T )} {1 ( )/2 http://www2.math.kyushu-u.ac.jp/ hara/lectures/lectures-j.html 1 2011 ( )/2 2 2011 4 1 2 1.1 1 2 1 2 3 4 5 1.1.1 sample space S S = {H, T } H T T H S = {(H, H), (H, T ), (T, H), (T, T )} (T, H) S

More information

68 A mm 1/10 A. (a) (b) A.: (a) A.3 A.4 1 1

68 A mm 1/10 A. (a) (b) A.: (a) A.3 A.4 1 1 67 A Section A.1 0 1 0 1 Balmer 7 9 1 0.1 0.01 1 9 3 10:09 6 A.1: A.1 1 10 9 68 A 10 9 10 9 1 10 9 10 1 mm 1/10 A. (a) (b) A.: (a) A.3 A.4 1 1 A.1. 69 5 1 10 15 3 40 0 0 ¾ ¾ É f Á ½ j 30 A.3: A.4: 1/10

More information

untitled

untitled ( œ ) œ 138,800 17 171,000 60,000 16,000 252,500 405,400 24,000 22 95,800 24 46,000 16,000 16,000 273,000 19,000 10,300 57,800 1,118,408,500 1,118,299,000 109,500 102,821,836 75,895,167 244,622 3,725,214

More information

5 Armitage x 1,, x n y i = 10x i + 3 y i = log x i {x i } {y i } 1.2 n i i x ij i j y ij, z ij i j 2 1 y = a x + b ( cm) x ij (i j )

5 Armitage x 1,, x n y i = 10x i + 3 y i = log x i {x i } {y i } 1.2 n i i x ij i j y ij, z ij i j 2 1 y = a x + b ( cm) x ij (i j ) 5 Armitage. x,, x n y i = 0x i + 3 y i = log x i x i y i.2 n i i x ij i j y ij, z ij i j 2 y = a x + b 2 2. ( cm) x ij (i j ) (i) x, x 2 σ 2 x,, σ 2 x,2 σ x,, σ x,2 t t x * (ii) (i) m y ij = x ij /00 y

More information

...J......1803.QX

...J......1803.QX 5 7 9 11 13 15 17 19 21 23 45-1111 48-2314 1 I II 100,000 80,000 60,000 40,000 20,000 0 272,437 80,348 82,207 81,393 82,293 83,696 84,028 82,232 248,983 80,411 4,615 4,757 248,434 248,688 76,708 6,299

More information

2 G(k) e ikx = (ik) n x n n! n=0 (k ) ( ) X n = ( i) n n k n G(k) k=0 F (k) ln G(k) = ln e ikx n κ n F (k) = F (k) (ik) n n= n! κ n κ n = ( i) n n k n

2 G(k) e ikx = (ik) n x n n! n=0 (k ) ( ) X n = ( i) n n k n G(k) k=0 F (k) ln G(k) = ln e ikx n κ n F (k) = F (k) (ik) n n= n! κ n κ n = ( i) n n k n . X {x, x 2, x 3,... x n } X X {, 2, 3, 4, 5, 6} X x i P i. 0 P i 2. n P i = 3. P (i ω) = i ω P i P 3 {x, x 2, x 3,... x n } ω P i = 6 X f(x) f(x) X n n f(x i )P i n x n i P i X n 2 G(k) e ikx = (ik) n

More information

n ξ n,i, i = 1,, n S n ξ n,i n 0 R 1,.. σ 1 σ i .10.14.15 0 1 0 1 1 3.14 3.18 3.19 3.14 3.14,. ii 1 1 1.1..................................... 1 1............................... 3 1.3.........................

More information

sec13.dvi

sec13.dvi 13 13.1 O r F R = m d 2 r dt 2 m r m = F = m r M M d2 R dt 2 = m d 2 r dt 2 = F = F (13.1) F O L = r p = m r ṙ dl dt = m ṙ ṙ + m r r = r (m r ) = r F N. (13.2) N N = R F 13.2 O ˆn ω L O r u u = ω r 1 1:

More information

量子力学 問題

量子力学 問題 3 : 203 : 0. H = 0 0 2 6 0 () = 6, 2 = 2, 3 = 3 3 H 6 2 3 ϵ,2,3 (2) ψ = (, 2, 3 ) ψ Hψ H (3) P i = i i P P 2 = P 2 P 3 = P 3 P = O, P 2 i = P i (4) P + P 2 + P 3 = E 3 (5) i ϵ ip i H 0 0 (6) R = 0 0 [H,

More information

( ) ( ) HPC SPH FPGA Web http://galaxy.u-aizu.ac.jp/trac/note/ : 1 4 : 2 6 : 3 6 GPU : ~ 100 1000 : ~ 1000-100000 Google : ~ 10000 : ~ 100000000 GPU, Cell, FPGA GRAPE-DR/GRAPE-MP ( ) GPU GPU : Matsumoto,

More information

ÊÂÎó·×»»¤È¤Ï/OpenMP¤Î½éÊâ¡Ê£±¡Ë

ÊÂÎó·×»»¤È¤Ï/OpenMP¤Î½éÊâ¡Ê£±¡Ë 2015 5 21 OpenMP Hello World Do (omp do) Fortran (omp workshare) CPU Richardson s Forecast Factory 64,000 L.F. Richardson, Weather Prediction by Numerical Process, Cambridge, University Press (1922) Drawing

More information

スライド 1

スライド 1 計算科学が拓く世界スーパーコンピュータは何故スーパーか 学術情報メディアセンター中島浩 http://www.para.media.kyoto-u.ac.jp/jp/ username=super password=computer 講義の概要 目的 計算科学に不可欠の道具スーパーコンピュータが どういうものか なぜスーパーなのか どう使うとスーパーなのかについて雰囲気をつかむ 内容 スーパーコンピュータの歴史を概観しつつ

More information

iii iv v vi 21 A B A B C C 1 1 2 3 4 5 6 1 2 3 4 5 6 7 8 9 10 11 19 22 30 39 43 48 54 60 65 74 77 84 87 89 95 101 12 20 23 31 40 44 49 55 61 66 75 78 85 88 90 96 102 13 21 24 32 41 45 50 56 62 67 76 79

More information

26102 (1/2) LSISoC: (1) (*) (*) GPU SIMD MIMD FPGA DES, AES (2/2) (2) FPGA(8bit) (ISS: Instruction Set Simulator) (3) (4) LSI ECU110100ECU1 ECU ECU ECU ECU FPGA ECU main() { int i, j, k for { } 1 GP-GPU

More information

HPC146

HPC146 2 3 4 5 6 int array[16]; #pragma xmp nodes p(4) #pragma xmp template t(0:15) #pragma xmp distribute t(block) on p #pragma xmp align array[i] with t(i) array[16] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Node

More information

表1票4.qx4

表1票4.qx4 iii iv v 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 22 23 10 11 24 25 26 27 10 56 28 11 29 30 12 13 14 15 16 17 18 19 2010 2111 22 23 2412 2513 14 31 17 32 18 33 19 34 20 35 21 36 24 37 25 38 2614

More information

ストリーミング SIMD 拡張命令2 (SSE2) を使用した、倍精度浮動小数点ベクトルの最大/最小要素とそのインデックスの検出

ストリーミング SIMD 拡張命令2 (SSE2) を使用した、倍精度浮動小数点ベクトルの最大/最小要素とそのインデックスの検出 SIMD 2(SSE2) / 2.0 2000 7 : 248602J-001 01/10/30 1 305-8603 115 Fax: 0120-47-8832 * Copyright Intel Corporation 1999-2001 01/10/30 2 1...5 2...5 2.1...5 2.1.1...5 2.1.2...8 3...9 3.1...9 3.2...9 4...9

More information

スライド 1

スライド 1 東北大学工学部機械知能 航空工学科 2019 年度クラス C D 情報科学基礎 I 14. さらに勉強するために 大学院情報科学研究科 鏡慎吾 http://www.ic.is.tohoku.ac.jp/~swk/lecture/ 0 と 1 の世界 これまで何を学んだか 2 進数, 算術演算, 論理演算 計算機はどのように動くのか プロセッサとメモリ 演算命令, ロード ストア命令, 分岐命令 計算機はどのように構成されているのか

More information

VLSI工学

VLSI工学 2008//5/ () 2008//5/ () 2 () http://ssc.pe.titech.ac.jp 2008//5/ () 3!! A (WCDMA/GSM) DD DoCoMo 905iP905i 2008//5/ () 4 minisd P900i SemiConsult SDRAM, MPEG4 UIMIrDA LCD/ AF ADC/DAC IC CCD C-CPUA-CPU DSPSRAM

More information

GPUコンピューティング講習会パート1

GPUコンピューティング講習会パート1 GPU コンピューティング (CUDA) 講習会 GPU と GPU を用いた計算の概要 丸山直也 スケジュール 13:20-13:50 GPU を用いた計算の概要 担当丸山 13:50-14:30 GPU コンピューティングによる HPC アプリケーションの高速化の事例紹介 担当青木 14:30-14:40 休憩 14:40-17:00 CUDA プログラミングの基礎 担当丸山 TSUBAME の

More information

1 (bit ) ( ) PC WS CPU IEEE754 standard ( 24bit) ( 53bit)

1 (bit ) ( ) PC WS CPU IEEE754 standard ( 24bit) ( 53bit) GNU MP BNCpack tkouya@cs.sist.ac.jp 2002 9 20 ( ) Linux Conference 2002 1 1 (bit ) ( ) PC WS CPU IEEE754 standard ( 24bit) ( 53bit) 10 2 2 3 4 5768:9:; = %? @BADCEGFH-I:JLKNMNOQP R )TSVU!" # %$ & " #

More information

エクセルカバー入稿用.indd

エクセルカバー入稿用.indd i 1 1 2 3 5 5 6 7 7 8 9 9 10 11 11 11 12 2 13 13 14 15 15 16 17 17 ii CONTENTS 18 18 21 22 22 24 25 26 27 27 28 29 30 31 32 36 37 40 40 42 43 44 44 46 47 48 iii 48 50 51 52 54 55 59 61 62 64 65 66 67 68

More information

untitled

untitled 2005 2 1 105-0004 5-34-3 Tel: 03-3431-4002 Fax: 03-3431-4044 1 SRL/ISTEC 1 1 SFQ SFQ SFQ 2004 9 4 SFQ SFQ / LSI 269 230 230 230 269 230 SFQ SFQ 2005 2 ISTEC 2005 All rights reserved. - 1 - 2005 2 1 105-0004

More information

スライド 1

スライド 1 演算精度に応じた高性能計算を実現するコンパイラの提案と実装 会津大学中里直人 概要 No.2 問題設定 アクセラレータの紹介 問題特化型のコンパイラ 性能評価 GRAPE-DRでの性能評価 RV770での性能評価 他の応用例 発展のアイデア Grand Challenge problems No.3 Grand Challenge problems No.4 Simulations with very

More information

( š ) š 13,448 1,243,000 1,249,050 1,243,000 1,243,000 1,249,050 1,249, , , ,885

( š ) š 13,448 1,243,000 1,249,050 1,243,000 1,243,000 1,249,050 1,249, , , ,885 ( š ) 7,000,000 191 191 6,697,131 5,845,828 653,450 197,853 4,787,707 577,127 4,000,000 146,580 146,580 64,000 100,000 500,000 120,000 60,000 60,000 60,000 60,000 60,000 200,000 150,000 60,000 60,000 100,000

More information

2

2 GPU 2008/11/30 GPU GPU UniformGrid GPU CPU GeForce6 9 kd-tree GPU GPU UG kd-tree GPU CPU GPU GPU GPU I/O PCI-Express DMA DirectX9 DirectX 3D OpenGL CUDA Larrabee Mac 2008/11/28 Mac(Carbon) Carbon.framework/QuickTime.framework

More information

2

2 ( ) 1 2 3 1.CPU, 2.,,,,,, 3. register, register, 4.L1, L2, (L3), (L4) 4 register L1 cache L2 cache Main Memory,, L2, L1 CPU L2, L1, CPU 5 , 6 dgem2vu 7 ? Wiedemann algorithm u 0, w 0, s i, s i = u 0 Ai

More information

01_.g.r..

01_.g.r.. I II III IV V VI VII VIII IX X XI I II III IV V I I I II II II I I YS-1 I YS-2 I YS-3 I YS-4 I YS-5 I YS-6 I YS-7 II II YS-1 II YS-2 II YS-3 II YS-4 II YS-5 II YS-6 II YS-7 III III YS-1 III YS-2

More information

NEC All rights reserved 1

NEC All rights reserved 1 NEC All rights reserved 1 NEC All rights reserved 2 NEC All rights reserved 3 (Founder) (Langchao Langchao) NEC All rights reserved 4 2.1 GB/s 64 bits wide 266 MHz 4 MB L3 on board, 96k L2, 32k L1 on -die

More information

t = h x z z = h z = t (x, z) (v x (x, z, t), v z (x, z, t)) ρ v x x + v z z = 0 (1) 2-2. (v x, v z ) φ(x, z, t) v x = φ x, v z

t = h x z z = h z = t (x, z) (v x (x, z, t), v z (x, z, t)) ρ v x x + v z z = 0 (1) 2-2. (v x, v z ) φ(x, z, t) v x = φ x, v z I 1 m 2 l k 2 x = 0 x 1 x 1 2 x 2 g x x 2 x 1 m k m 1-1. L x 1, x 2, ẋ 1, ẋ 2 ẋ 1 x = 0 1-2. 2 Q = x 1 + x 2 2 q = x 2 x 1 l L Q, q, Q, q M = 2m µ = m 2 1-3. Q q 1-4. 2 x 2 = h 1 x 1 t = 0 2 1 t x 1 (t)

More information