GRAPE GRAPE-DR V-GRAPE
|
|
- みりあ えいさか
- 5 years ago
- Views:
Transcription
1 GRAPE-DR / 2006/11/20-22
2 GRAPE GRAPE-DR V-GRAPE
3
4
5
6
7 ( ) SDSS
8 Genzel et al 2003 Adaptive Optics SgrA ( )
9 12 1
10 :
11 GRAPE : (Barnes-Hut tree, FMM, Particle- Mesh Ewald(PPPM)...): ( )
12 1988
13 GRAPE-1(1989) Mflops
14 GRAPE-2(1990) 8 ( ) 40Mflops
15 GRAPE-3(1991) MHz 7.2Gflops
16 GRAPE-3 1µm MHz 600 Mflops
17 GRAPE-4(1995) Tflops
18 GRAPE-4 Xi Xi sqrt Pcut Fcut Xi Xi m/r FiFiPi m j Xi Xi r 2 Xi Xi Func. eval. Xi Xi Xi Xi Xi Xi Xi Xi m/r 3 Xi Xi Xi FiFiFi Xj Xi Xi Vi Xi Xi r. v m/r 5 Xi Xi Vj Xi Xi Xi Xi FiFiJi Xi Xi Xi Xi 1µm 10 (40 ) 640Mflops
19 GRAPE-6(2002) Tflops
20 パイプライン LSI 0.25 µm ルール (東芝 TC-240, 1.8M ゲート) 90 MHz 動作 6 パイプラインを集積 チップあたり 31 Gflops
21 2006 GRAPE-6 Core 2 Extreme 250nm 65nm 90MHz 2.93GHz 32.4Gflops 23.44Gflops 10W 75W 1W 3.24Gflops Gflops
22 GRAPE-4
23 GRAPE-6 MDGRAPE-3 : MDGRAPE-4, 20Pflops@2010 MDGRAPE-3 GRAPE-DR
24 GRAPE-DR GRAPE : 2 Petaflops Tflops GRAPE : GRAPE
25 GRAPE ( ( N )) µm µm nm nm 10
26 1.
27 1. 2.
28
29 GRAPE-DR (3)
30 1
31 : ( ) 1. GRAPE SIMD
32 SIMD SIMD (Single Instruction Multiple Data): GRAPE
33 SIMD SIMD SSE MMX SIMD GRAPE-DR SIMD
34 SIMD Illiac IV, Goodyear MPP, ICL DAP, TMC CM-2, MASPAR MP-1 ALU REG MEM ALU REG MEM ALU REG MEM ALU REG MEM ALU REG MEM : : SIMD
35 SIMD Pentium III, R0 R1 R2 R3 R4 R5 R6 R7 W0 W1 W0 W1 W0 W1 W0 W1 W0 W1 W0 W1 W0 W1 W0 W1 W2 W3 W2 W3 W2 W3 W2 W3 W2 W3 W2 W3 W2 W3 W2 W3 ALU0 ALU1 ALU2 ALU3 1 : 4
36 nyo d4prqts B8C*DFEHGFI 7KJ GRAPE-DR SIMD!"$# %'& (*)+,-. /0!"$#%ˆ $Š 'ŒŽ (* & ) \Y]_^[`baTced 1$243$5687*9 (FPGA :';$< ) RTSVUTWYX[Z yz{z z} ~ $ƒ Q 0 w4xzyz{ L$M4N'OQP SING u Xtv (PE) 1 PE = + ( ) (PE ) PE (BB)
37 *,+ (M) PE PEID BBID A x + "! B T 32W 256W ALU 256 # $ % & (' #)$ & (' (K M )
38
39 32PE( ) 16 18mm
40 GRAPE-DR 500MHz 100 Gflops ( )
41 GRAPE-DR 別ボード こっちが プロジェ クト公式 中身は殆ど同じ 何故か大きい LINPACK が動作 したらしい
42 GRAPE
43 : g i = j f(x i, x j ) i j j i j, i j ( )
44 ( 2006) /VARI xi, yi, zi, e2; /VARJ xj, yj, zj, mj; /VARF fx, fy, fz; dx = xi - xj; dy = yi - yj; dz = zi - zj; r2 = dx*dx + dy*dy + dz*dz + e2; r3i= powm32(r2); ff = mj*r3i; fx += ff*dx; fy += ff*dy; fz += ff*dz; GRAPE PGR (FPGA PROGRAPE D 2006)
45 / int SING_send_j_particle(struct grape_j_particle_struct *jp, int index_in_em); int SING_send_i_particle(struct grape_i_particle_struct *ip, int n); int SING_get_result(struct grape_result_struct *rp); void SING_grape_init(); int SING_grape_run(int n);
46 2 ( )
47 V-GRAPE GRAPE-DR = V-GRAPE
48 GRAPE-DR 256Gflops MDGRAPE-3 FPGA FFT CG 2
49 FFT CG :
50 FFT FFT FFT : 10 log n 4GB/s 10 Gflops CPU
51 CG : O(10)
52 GRAPE-DR: 1MB Intel Itanium : 24MB? DRAM : 32 MB
53 V-GRAPE PE PE PE PE PE PE PE PE GRAPE-DR V-GRAPE PE
54 V-GRAPE / : ( ) :
55 : PIC
56 PIC Charge assignment Charge assignment: GRAPE- DR :
57 Charge 1 : 50 ( ) 1 : 12 ( ) : 1 4 GRAPE-DR : 2 GB/s 8 Gflops
58 GRAPE-DR GPGPU V-GRAPE
59 GPGPU nvidia 8800: C 768MB 90GB/s(SX-7 3 ) GPU C 400Gflops 1 (8 )
60 V-GRAPE 128MB GB/s 1.5Tflops (50 )
61 : = 3 10 GRAPE-DR 100Gflops
62 GRAPE LSI GRAPE-DR SIMD GRAPE V-GRAPE PIC GRAPE-DR GPGPU V-GRAPE
63
64 Memory Wall : : : :
65 1990 I/O
66
67 : 30
68 V-GRAPE BLAS, LAPACK PE PGDL ( FPGA ) SPH ( 150)
69 :
70 (M. Flynn) SISD/SIMD/MISD/MIMD (SI) (MI) (SD) (MD) SIMD SIMD ( ) MIMD
71 SIMD GRAPE ( ) : : ( ) : 1000 ( / )
72 (PE) (j- ) j- j- j- i- PE PE PE PE PE i- PE PE PE PE PE i- PE PE PE PE PE i- PE PE PE PE PE i- PE PE PE PE PE i- PE PE PE PE PE j- j- (GRAPE-6 ) 2 : 2
73 PE PE PE PE PE PE broadcast memory PE PE PE PE broadcast memory PE PE PE PE broadcast memory PE PE PE PE broadcast memory ( ) Memory controller/host
74 SING: Sing Is Not GRAPE DRAM DRAM DRAM DRAM FPGA CP SING FPGA CP SING FPGA CP SING FPGA CP SING FPGA Host interface PCI-X/PCIE PCI
75 GRAPE : SIMD GDR : (FPGA ) =
76 PE PE ( )
77 var vector long xi hlt flt64to72 var vector long yi hlt flt64to72 var vector long zi hlt flt64to72 var vector short idxi hlt fix32to36ru bvar long xj elt flt64to72 bvar long yj elt flt64to72 bvar long zj elt flt64to72 bvar long vxj xj bvar short mj elt flt64to36 bvar short eps2 elt flt64to36 bvar short idxj elt fix32to36ru var short lmj var short leps2 var short lidxj var vector long accx rrn flt72to64 fadd var vector long accy rrn flt72to64 fadd var vector long accz rrn flt72to64 fadd var vector long pot rrn flt72to64 fadd hlt, elt, rrn
78 loop initialization vlen 4 uxor $t $t $t upassa $ti $ti $lr40v upassa $t $t $lr48v upassa $t $t $lr56v upassa $t $t pot loop body vlen 3 bm vxj $lr0v vlen 1 bm mj lmj bm eps2 leps2 bm idxj lidxj ( ) ( ) ( )
79 vlen 4 nop upassa idxi idxi $t uxor $ti lidxj $t moi 2 ( ) ulnot $ti $ti $t # mreg 1 indicates i!= j moi 0 nop fsub $lr0 xi $r6v $t fsub $lr2 yi $r10v ; fmul $ti $ti $t fsub $lr4 zi $r14v fmul $r10v $r10v $r18v ; fadd $t leps2 $t fmul $r14v $r14v ; fadd $fb $ti $t fadd $fb $ti $r18v $t # rsq is now in r18 t, dx, dy,dz are in 2
80 ( ) ulsr $ti il"60" $t $lr22v ulsr $ti il"1" $t uadd $ti $lr22v $t usub hl"9fd" $ti $t # $lr8v 1.5 ulsl $ti il"60" $lr30v moi 1 uand il"1" $lr22v moi 0 uand $r18v h"000ffffff" $t uor $ti h"3ff000000" $t fmul $ti f"0.57" $t fsub f"1.57" $ti $t mi 1 fmul f"1.414" $ti $t mi 0 nop fmul $t $lr30v $t $r22v # Here the result is the initial guess r 3 1
81 ( ) fmul $r18v $r18v $r26v $t fmul $r18v $ti $r26v $t fmul $ti f"0.5" $r26v # r26v is a**3/2 fmul $r22v $r22v $t fmul $ti $r26v $t fsub f"1.5" $ti $t fmul $r22v $ti $t $r22v fmul $ti $ti $t fmul $ti $r26v $t ( ) fsub f"1.5" $ti $t fmul $r22v $ti $t $r22v fmul $ti $ti $t fmul $ti $r26v $t fsub f"0.5" $ti $t fmul $r22v $ti $t fadd $r22v $ti $t fmul lmj $ti $t $r22v
82 ( ) mi 2 fmul $r6v $ti ; upassa pot pot $lr0v fmul $r10v $t ; fadd $fb $lr40v $lr40v accx fmul $r14v $t ; fadd $fb $lr48v $lr48v accy fmul $r18v $t ; fadd $fb $lr56v $lr56v accz fadd $fb $lr0v pot
83 int SING_send_j_particle(struct grape_j_particle_struct *jp, int index_in_em); int SING_send_i_particle(struct grape_i_particle_struct *ip, int n); int SING_get_result(struct grape_result_struct *rp); void SING_grape_init(); int SING_grape_run(int n); GRAPE-3/5
84 struct grape_j_particle_struct{ double xj; double yj; double zj; double mj; double eps2; UINT32 idxj; }; struct grape_i_particle_struct{ double xi; double yi; double zi; UINT32 idxi; }; struct grape_result_struct{ double accx; double accy; double accz; double pot; };
85 17mm
86
87 PE
GRAPE GRAPE-DR V-GRAPE
V-GRAPE / CCSR 2007/1/24 GRAPE GRAPE-DR V-GRAPE http://antwrp.gsfc.nasa.gov/apod/ap950917.html ( ) SDSS GRAPE : (Barnes-Hut tree, FMM, Particle- Mesh Ewald(PPPM)...): ( ) 1988 GRAPE-1(1989) 16 8 32
More informationHPC / (CfCA) HPC 2007/11/23-25
HPC / (CfCA) HPC 2007/11/23-25 CfCA GRAPE GRAPE GRAPE-DR HPC : : 1 1 (II ) Ia 100 1 ( ) 0.1 pc 1 AU 3 : 1 100 Top-down Katz and Gunn 1992 Dark Matter + + DM, : :SPH 10 4 Cray YMP 500-1000 : 10 7 Saitoh
More informationGRAPE-DR /
GRAPE-DR / GRAPE GRAPE-DR GRAPE ( ): (Barnes-Hut tree, FMM, Particle- Mesh Ewald(PPPM)...): ( ) 1988 32 IC 200 0.1m 3 400 GRAPE-1(1989) 16 8 32 48 240Mflops GRAPE-2(1990) 8 ( ) 40Mflops GRAPE-3(1991) 24
More informationAgenda GRAPE-MPの紹介と性能評価 GRAPE-MPの概要 OpenCLによる四倍精度演算 (preliminary) 4倍精度演算用SIM 加速ボード 6 processor elem with 128 bit logic Peak: 1.2Gflops
Agenda GRAPE-MPの紹介と性能評価 GRAPE-MPの概要 OpenCLによる四倍精度演算 (preliminary) 4倍精度演算用SIM 加速ボード 6 processor elem with 128 bit logic Peak: 1.2Gflops ボードの概要 Control processor (FPGA by Altera) GRAPE-MP chip[nextreme
More information: 50 10 10 1. : : 3 : 4 : 2 2. : 1946 1975 1 : load: store: : : ( ) ( ) : 101 x 101 ------------- 101 101 ------------ 11001 2 ( ): 32 32 1 32 : 32 ( ) 32 ( ) : log 2 32 : : ( F) ( D) E W 1 4 : F D E
More informationストリーミング SIMD 拡張命令2 (SSE2) を使用した SAXPY/DAXPY
SIMD 2(SSE2) SAXPY/DAXPY 2.0 2000 7 : 248600J-001 01/12/06 1 305-8603 115 Fax: 0120-47-8832 * Copyright Intel Corporation 1999, 2000 01/12/06 2 1...5 2 SAXPY DAXPY...5 2.1 SAXPY DAXPY...6 2.1.1 SIMD C++...6
More informationuntitled
taisuke@cs.tsukuba.ac.jp http://www.hpcs.is.tsukuba.ac.jp/~taisuke/ CP-PACS HPC PC post CP-PACS CP-PACS II 1990 HPC RWCP, HPC かつての世界最高速計算機も 1996年11月のTOP500 第一位 ピーク性能 614 GFLOPS Linpack性能 368 GFLOPS (地球シミュレータの前
More information並列計算の数理とアルゴリズム サンプルページ この本の定価 判型などは, 以下の URL からご覧いただけます. このサンプルページの内容は, 初版 1 刷発行時のものです.
並列計算の数理とアルゴリズム サンプルページ この本の定価 判型などは, 以下の URL からご覧いただけます. http://www.morikita.co.jp/books/mid/080711 このサンプルページの内容は, 初版 1 刷発行時のものです. Calcul scientifique parallèle by Frédéric Magoulès and François-Xavier
More informationA 99% MS-Free Presentation
A 99% MS-Free Presentation 2 Galactic Dynamics (Binney & Tremaine 1987, 2008) Dynamics of Galaxies (Bertin 2000) Dynamical Evolution of Globular Clusters (Spitzer 1987) The Gravitational Million-Body Problem
More informationアクセラレータのデモと プログラミング手法
アクセラレータのデモと プログラミング手法 会津大学中里直人 アクセラレータボードを使った高速化スクール 2009/12/07 アクセラレータとは (1) ホスト計算機を補佐して特定の計算を高速化する計算機デバイス ホスト (CPU) で動作するプログラムを補佐 アクセラレータの例 Cell/PowerXCell8iブレード ボード : 計算 GPU ボード (NVIDIA, AMD, S3) :
More informationスパコンに通じる並列プログラミングの基礎
2016.06.06 2016.06.06 1 / 60 2016.06.06 2 / 60 Windows, Mac Unix 0444-J 2016.06.06 3 / 60 Part I Unix GUI CUI: Unix, Windows, Mac OS Part II 0444-J 2016.06.06 4 / 60 ( : ) 6 6 ( ) 6 10 6 16 SX-ACE 6 17
More informationスパコンに通じる並列プログラミングの基礎
2018.06.04 2018.06.04 1 / 62 2018.06.04 2 / 62 Windows, Mac Unix 0444-J 2018.06.04 3 / 62 Part I Unix GUI CUI: Unix, Windows, Mac OS Part II 2018.06.04 4 / 62 0444-J ( : ) 6 4 ( ) 6 5 * 6 19 SX-ACE * 6
More information( )
1. 2. 3. 4. 5. ( ) () http://www-astro.physics.ox.ac.uk/~wjs/apm_grey.gif http://antwrp.gsfc.nasa.gov/apod/ap950917.html ( ) SDSS : d 2 r i dt 2 = Gm jr ij j i rij 3 = Newton 3 0.1% 19 20 20 2 ( ) 3 3
More informationスパコンに通じる並列プログラミングの基礎
2018.09.10 furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 1 / 59 furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 2 / 59 Windows, Mac Unix 0444-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 3 / 59 Part I Unix GUI CUI:
More informationsupercomputer2010.ppt
nanri@cc.kyushu-u.ac.jp 1 !! : 11 12! : nanri@cc.kyushu-u.ac.jp! : Word 2 ! PC GPU) 1997 7 http://wiredvision.jp/news/200806/2008062322.html 3 !! (Cell, GPU )! 4 ! etc...! 5 !! etc. 6 !! 20km 40 km ) 340km
More informationuntitled
A = QΛQ T A n n Λ Q A = XΛX 1 A n n Λ X GPGPU A 3 T Q T AQ = T (Q: ) T u i = λ i u i T {λ i } {u i } QR MR 3 v i = Q u i A {v i } A n = 9000 Quad Core Xeon 2 LAPACK (4/3) n 3 O(n 2 ) O(n 3 ) A {v i }
More informationuntitled
A = QΛQ T A n n Λ Q A = XΛX 1 A n n Λ X GPGPU A 3 T Q T AQ = T (Q: ) T u i = λ i u i T {λ i } {u i } QR MR 3 v i = Q u i A {v i } A n = 9000 Quad Core Xeon 2 LAPACK (4/3) n 3 O(n 2 ) O(n 3 ) A {v i }
More informationEGunGPU
Super Computing in Accelerator simulations - Electron Gun simulation using GPGPU - K. Ohmi, KEK-Accel Accelerator Physics seminar 2009.11.19 Super computers in KEK HITACHI SR11000 POWER5 16 24GB 16 134GFlops,
More information2005 1
25 SPARCstation 2 CPU central processor unit 25 2 25 3 25 4 DRAM 25 5 25 6 : DRAM 25 7 2 25 8 2 25 9 2 bit: binary digit V 2V 25 2 2 2 2 4 5 2 6 3 7 25 A B C A B C A B C A B C A C A B 3 25 2 25 3 Co Cin
More informationマルチコアPCクラスタ環境におけるBDD法のハイブリッド並列実装
2010 GPGPU 2010 9 29 MPI/Pthread (DDM) DDM CPU CPU CPU CPU FEM GPU FEM CPU Mult - NUMA Multprocessng Cell GPU Accelerator, GPU CPU Heterogeneous computng L3 cache L3 cache CPU CPU + GPU GPU L3 cache 4
More informationCPU Levels in the memory hierarchy Level 1 Level 2... Increasing distance from the CPU in access time Level n Size of the memory at each level 1: 2.2
FFT 1 Fourier fast Fourier transform FFT FFT FFT 1 FFT FFT 2 Fourier 2.1 Fourier FFT Fourier discrete Fourier transform DFT DFT n 1 y k = j=0 x j ω jk n, 0 k n 1 (1) x j y k ω n = e 2πi/n i = 1 (1) n DFT
More informationuntitled
Power Wall HPL1 10 B/F EXTREMETECH Supercomputing director bets $2,000 that we won t have exascale computing by 2020 One of the biggest problems standing in our way is power. [] http://www.extremetech.com/computing/155941
More informationThe 3 key challenges in programming for MC
Aug 3 06 Software &Solutions group Intel Intel Centrino Intel NetBurst Intel XScale Itanium Pentium Xeon Intel Core VTune Intel Corporation Intel NetBurst Pentium Xeon Pentium M Core 64 2 Intel Software
More informationiphone GPGPU GPU OpenCL Mac OS X Snow LeopardOpenCL iphone OpenCL OpenCL NVIDIA GPU CUDA GPU GPU GPU 15 GPU GPU CPU GPU iii OpenMP MPI CPU OpenCL CUDA OpenCL CPU OpenCL GPU NVIDIA Fermi GPU Fermi GPU GPU
More informationAMD/ATI Radeon HD 5870 GPU DEGIMA LINPACK HD 5870 GPU DEGIMA LINPACK GFlops/Watt GFlops/Watt Abstract GPU Computing has lately attracted
DEGIMA LINPACK Energy Performance for LINPACK Benchmark on DEGIMA 1 AMD/ATI Radeon HD 5870 GPU DEGIMA LINPACK HD 5870 GPU DEGIMA LINPACK 1.4698 GFlops/Watt 1.9658 GFlops/Watt Abstract GPU Computing has
More informatione Ž ¹ vµ q ¹¹¹ ¹¹¹¹¹ vµ j ¹¹¹ ¹¹¹¹ r µ ¹¹¹¹ ¹¹¹¹¹ µ ¹¹¹¹¹ ¹¹¹¹ µ ¹¹¹¹ ¹¹¹ vµ ¹¹¹¹ ¹¹¹¹ vµ Ž ¹¹¹ ¹¹¹¹ vµˆ ¹¹¹ ¹¹¹¹¹ µ ¹¹¹¹ ¹¹¹¹¹¹¹¹ µ ¹¹¹¹¹ ¹¹¹
e Ž µ ¹¹¹ ¹¹¹ v µ ¹¹¹¹¹ ¹¹¹¹¹¹ rµ ¹¹¹¹ ¹¹¹ j µ r µž ¹¹¹¹¹ ¹¹¹¹ µ ¹¹¹ ¹¹¹¹ µ ¹¹¹¹ ¹¹¹¹ µ ¹¹¹¹¹ µ ¹¹¹¹¹¹ ¹¹¹¹¹ l vµ u ¹¹¹ ¹¹¹¹¹¹ µ ¹¹¹¹ ¹¹¹¹¹ µ µ ¹¹¹ ¹¹¹ µg ¹¹¹¹ ¹¹¹¹¹ r µ Ž ¹¹¹ ¹¹¹ vµ ¹¹¹¹ ¹¹¹¹ µ ¹¹¹¹¹
More informationohpr.dvi
2003-08-04 1984 VP-1001 CPU, 250 MFLOPS, 128 MB 2004ASCI Purple (LLNL)64 CPU 197, 100 TFLOPS, 50 TB, 4.5 MW PC 2 CPU 16, 4 GFLOPS, 32 GB, 3.2 kw 20028 CPU 640, 40 TFLOPS, 10 TB, 10 MW (ASCI: Accelerated
More information( : December 27, 2015) CONTENTS I. 1 II. 2 III. 2 IV. 3 V. 5 VI. 6 VII. 7 VIII. 9 I. 1 f(x) f (x) y = f(x) x ϕ(r) (gradient) ϕ(r) (gradϕ(r) ) ( ) ϕ(r)
( : December 27, 215 CONTENTS I. 1 II. 2 III. 2 IV. 3 V. 5 VI. 6 VII. 7 VIII. 9 I. 1 f(x f (x y f(x x ϕ(r (gradient ϕ(r (gradϕ(r ( ϕ(r r ϕ r xi + yj + zk ϕ(r ϕ(r x i + ϕ(r y j + ϕ(r z k (1.1 ϕ(r ϕ(r i
More informationuntitled
PC murakami@cc.kyushu-u.ac.jp muscle server blade server PC PC + EHPC/Eric (Embedded HPC with Eric) 1216 Compact PCI Compact PCIPC Compact PCISH-4 Compact PCISH-4 Eric Eric EHPC/Eric EHPC/Eric Gigabit
More informationSlides: TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments
計算機アーキテクチャ第 11 回 マルチプロセッサ 本資料は授業用です 無断で転載することを禁じます 名古屋大学 大学院情報科学研究科 准教授加藤真平 デスクトップ ジョブレベル並列性 スーパーコンピュータ 並列処理プログラム プログラムの並列化 for (i = 0; i < N; i++) { x[i] = a[i] + b[i]; } プログラムの並列化 x[0] = a[0] + b[0];
More information262014 3 1 1 6 3 2 198810 2/ 198810 2 1 3 4 http://www.pref.hiroshima.lg.jp/site/monjokan/ 1... 1... 1... 2... 2... 4... 5... 9... 9... 10... 10... 10... 10... 13 2... 13 3... 15... 15... 15... 16 4...
More information次世代スーパーコンピュータのシステム構成案について
6 19 4 27 1. 2. 3. 3.1 3.2 A 3.3 B 4. 5. 2007/4/27 4 1 1. 2007/4/27 4 2 NEC NHF2 18 9 19 19 2 28 10PFLOPS2.5PB 30MW 3,200 18 12 12 SimFold, GAMESS, Modylas, RSDFT, NICAM, LatticeQCD, LANS HPL, NPB-FT 19
More information(Basic Theory of Information Processing) 1
(Basic Theory of Information Processing) 1 10 (p.178) Java a[0] = 1; 1 a[4] = 7; i = 2; j = 8; a[i] = j; b[0][0] = 1; 2 b[2][3] = 10; b[i][j] = a[2] * 3; x = a[2]; a[2] = b[i][3] * x; 2 public class Array0
More informationII 2 II
II 2 II 2005 yugami@cc.utsunomiya-u.ac.jp 2005 4 1 1 2 5 2.1.................................... 5 2.2................................. 6 2.3............................. 6 2.4.................................
More informationHPCマシンの変遷と 今後の情報基盤センターの役割
筑波大学計算科学センターシンポジウム 計算機アーキテクトが考える 次世代スパコン 2006 年 4 月 5 日 村上和彰 九州大学 murakami@cc.kyushu-u.ac.jp 次世代スパコン ~ 達成目標と制約条件の整理 ~ 達成目標 性能目標 (2011 年 ) LINPACK (HPL):10PFlop/s 実アプリケーション :1PFlop/s 成果目標 ( 私見 ) 科学技術計算能力の国際競争力の向上ならびに維持による我が国の科学技術力
More informationGPUを用いたN体計算
単精度 190Tflops GPU クラスタ ( 長崎大 ) の紹介 長崎大学工学部超高速メニーコアコンピューティングセンターテニュアトラック助教濱田剛 1 概要 GPU (Graphics Processing Unit) について簡単に説明します. GPU クラスタが得意とする応用問題を議論し 長崎大学での GPU クラスタによる 取組方針 N 体計算の高速化に関する研究内容 を紹介します. まとめ
More informationmain.dvi
PC 1 1 [1][2] [3][4] ( ) GPU(Graphics Processing Unit) GPU PC GPU PC ( 2 GPU ) GPU Harris Corner Detector[5] CPU ( ) ( ) CPU GPU 2 3 GPU 4 5 6 7 1 toyohiro@isc.kyutech.ac.jp 45 2 ( ) CPU ( ) ( ) () 2.1
More information1 GPU GPGPU GPU CPU 2 GPU 2007 NVIDIA GPGPU CUDA[3] GPGPU CUDA GPGPU CUDA GPGPU GPU GPU GPU Graphics Processing Unit LSI LSI CPU ( ) DRAM GPU LSI GPU
GPGPU (I) GPU GPGPU 1 GPU(Graphics Processing Unit) GPU GPGPU(General-Purpose computing on GPUs) GPU GPGPU GPU ( PC ) PC PC GPU PC PC GPU GPU 2008 TSUBAME NVIDIA GPU(Tesla S1070) TOP500 29 [1] 2009 AMD
More informationFIT2013( 第 12 回情報科学技術フォーラム ) C-017 SIMD Implementation and evaluation of a morphological pattern spectrum using an highly-parallel SIMD matrix process
C-017 SIMD Implementation an evaluation of a morphological pattern pectrum uing an highly-parallel SIMD matrix proceor Yauhi Tukaa Tomohiro Takea Tohiya Hona Takehi Kumaki Takehi Ogura Takehi Fujino 1.
More informationスライド 1
swk(at)ic.is.tohoku.ac.jp 2 Outline 3 ? 4 S/N CCD 5 Q Q V 6 CMOS 1 7 1 2 N 1 2 N 8 CCD: CMOS: 9 : / 10 A-D A D C A D C A D C A D C A D C A D C ADC 11 A-D ADC ADC ADC ADC ADC ADC ADC ADC ADC A-D 12 ADC
More information: , 2.0, 3.0, 2.0, (%) ( 2.
2017 1 2 1.1...................................... 2 1.2......................................... 4 1.3........................................... 10 1.4................................. 14 1.5..........................................
More informationItanium2ベンチマーク
HPC CPU mhori@ile.osaka-u.ac.jp Special thanks Timur Esirkepov HPC 2004 2 25 1 1. CPU 2. 3. Itanium 2 HPC 2 1 Itanium2 CPU CPU 3 ( ) Intel Itanium2 NEC SX-6 HP Alpha Server ES40 PRIMEPOWER SR8000 Intel
More information1重谷.PDF
RSCC RSCC RSCC BMT 1 6 3 3000 3000 200310 1994 19942 VPP500/32PE 19992 VPP700E/128PE 160PE 20043 2 2 PC Linux 2048 CPU Intel Xeon 3.06GHzDual) 12.5 TFLOPS SX-7 32CPU/256GB 282.5 GFLOPS Linux 3 PC 1999
More information( ) X x, y x y x y X x X x [x] ( ) x X y x y [x] = [y] ( ) x X y y x ( ˆX) X ˆX X x x z x X x ˆX [z x ] X ˆX X ˆX ( ˆX ) (0) X x, y d(x(1), y(1)), d(x
Z Z Ẑ 1 1.1 (X, d) X x 1, x 2,, x n, x x n x(n) ( ) X x x ε N N i, j i, j d(x(i), x(j)) < ε ( ) X x x n N N i i d(x(n), x(i)) < 1 n ( ) X x lim n x(n) X x X () X x, y lim n d(x(n), y(n)) = 0 x y x y 1
More information4
4 r r 43 44 a b c f d e a r b c d e f 45 r r r 46 47 a b g a b r c d e f r g c d e f e 48 mm r r 1 49 a r b c a b 1 1 a 3 a 50 1 a 3 1 mb a 1 mm 3 a a a 51 1 mm 1 mm 1 5 mb 3 4 1 3 4 1 53 1 1 mj r 1 a
More information2/66
1/66 9 Outline 1. 2. 3. 4. CPU 5. Jun. 13, 2013@A 2/66 3/66 4/66 Network Memory Memory Memory CPU SIMD if Cache CPU Cache CPU Cache CPU 5/66 FPU FPU Floating Processing Unit Register Register Register
More information07-二村幸孝・出口大輔.indd
GPU Graphics Processing Units HPC High Performance Computing GPU GPGPU General-Purpose computation on GPU CPU GPU GPU *1 Intel Quad-Core Xeon E5472 3.0 GHz 2 6 MB L2 cache 1600 MHz FSB 80 GFlops 1 nvidia
More information23 Fig. 2: hwmodulev2 3. Reconfigurable HPC 3.1 hw/sw hw/sw hw/sw FPGA PC FPGA PC FPGA HPC FPGA FPGA hw/sw hw/sw hw- Module FPGA hwmodule hw/sw FPGA h
23 FPGA CUDA Performance Comparison of FPGA Array with CUDA on Poisson Equation (lijiang@sekine-lab.ei.tuat.ac.jp), (kazuki@sekine-lab.ei.tuat.ac.jp), (takahashi@sekine-lab.ei.tuat.ac.jp), (tamukoh@cc.tuat.ac.jp),
More informationPart () () Γ Part ,
Contents a 6 6 6 6 6 6 6 7 7. 8.. 8.. 8.3. 8 Part. 9. 9.. 9.. 3. 3.. 3.. 3 4. 5 4.. 5 4.. 9 4.3. 3 Part. 6 5. () 6 5.. () 7 5.. 9 5.3. Γ 3 6. 3 6.. 3 6.. 3 6.3. 33 Part 3. 34 7. 34 7.. 34 7.. 34 8. 35
More information°ÌÁê¿ô³ØII
July 14, 2007 Brouwer f f(x) = x x f(z) = 0 2 f : S 2 R 2 f(x) = f( x) x S 2 3 3 2 - - - 1. X x X U(x) U(x) x U = {U(x) x X} X 1. U(x) A U(x) x 2. A U(x), A B B U(x) 3. A, B U(x) A B U(x) 4. A U(x),
More information統計学のポイント整理
.. September 17, 2012 1 / 55 n! = n (n 1) (n 2) 1 0! = 1 10! = 10 9 8 1 = 3628800 n k np k np k = n! (n k)! (1) 5 3 5 P 3 = 5! = 5 4 3 = 60 (5 3)! n k n C k nc k = npk k! = n! k!(n k)! (2) 5 3 5C 3 = 5!
More informationXACCの概要
2 global void kernel(int a[max], int llimit, int ulimit) {... } : int main(int argc, char *argv[]){ MPI_Int(&argc, &argc); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); dx
More informationrank ”«‘‚“™z‡Ì GPU ‡É‡æ‡éŁÀŠñ›»
rank GPU ERATO 2011 11 1 1 / 26 GPU rank/select wavelet tree balanced parenthesis GPU rank 2 / 26 GPU rank/select wavelet tree balanced parenthesis GPU rank 2 / 26 GPU rank/select wavelet tree balanced
More informationi
14 i ii iii iv v vi 14 13 86 13 12 28 14 16 14 15 31 (1) 13 12 28 20 (2) (3) 2 (4) (5) 14 14 50 48 3 11 11 22 14 15 10 14 20 21 20 (1) 14 (2) 14 4 (3) (4) (5) 12 12 (6) 14 15 5 6 7 8 9 10 7
More information..3. Ω, Ω F, P Ω, F, P ). ) F a) A, A,..., A i,... F A i F. b) A F A c F c) Ω F. ) A F A P A),. a) 0 P A) b) P Ω) c) [ ] A, A,..., A i,... F i j A i A
.. Laplace ). A... i),. ω i i ). {ω,..., ω } Ω,. ii) Ω. Ω. A ) r, A P A) P A) r... ).. Ω {,, 3, 4, 5, 6}. i i 6). A {, 4, 6} P A) P A) 3 6. ).. i, j i, j) ) Ω {i, j) i 6, j 6}., 36. A. A {i, j) i j }.
More information1009.\1.\4.ai
- 1 - E O O O O O O - 2 - E O O O - 3 - O N N N N N N N N N N N N N N N N N N N N N N N E e N N N N N N N N N N N N N N N N N N N N N N N D O O O - 4 - O O O O O O O O N N N N N N N N N N N N N N N N N
More information1 4 1.1........................................... 4 1.2.................................. 4 1.3................................... 4 2 5 2.1 GPU.....
CPU GPU N Q07-065 2011 2 17 1 1 4 1.1........................................... 4 1.2.................................. 4 1.3................................... 4 2 5 2.1 GPU...........................................
More information2 Chapter 4 (f4a). 2. (f4cone) ( θ) () g M. 2. (f4b) T M L P a θ (f4eki) ρ H A a g. v ( ) 2. H(t) ( )
http://astr-www.kj.yamagata-u.ac.jp/~shibata f4a f4b 2 f4cone f4eki f4end 4 f5meanfp f6coin () f6a f7a f7b f7d f8a f8b f9a f9b f9c f9kep f0a f0bt version feqmo fvec4 fvec fvec6 fvec2 fvec3 f3a (-D) f3b
More information1. A0 A B A0 A : A1,...,A5 B : B1,...,B
1. A0 A B A0 A : A1,...,A5 B : B1,...,B12 2. 3. 4. 5. A0 A, B Z Z m, n Z m n m, n A m, n B m=n (1) A, B (2) A B = A B = Z/ π : Z Z/ (3) A B Z/ (4) Z/ A, B (5) f : Z Z f(n) = n f = g π g : Z/ Z A, B (6)
More informationsmpp_resume.dvi
6 mmiki@mail.doshisha.ac.jp Parallel Processing Parallel Pseudo-parallel Concurrent 1) 1/60 1) 1997 5 11 IBM Deep Blue Deep Blue 2) PC 2000 167 Rank Manufacturer Computer Rmax Installation Site Country
More informationopenmp1_Yaguchi_version_170530
並列計算とは /OpenMP の初歩 (1) 今 の内容 なぜ並列計算が必要か? スーパーコンピュータの性能動向 1ExaFLOPS 次世代スハ コン 京 1PFLOPS 性能 1TFLOPS 1GFLOPS スカラー機ベクトル機ベクトル並列機並列機 X-MP ncube2 CRAY-1 S-810 SR8000 VPP500 CM-5 ASCI-5 ASCI-4 S3800 T3E-900 SR2201
More information( )/2 hara/lectures/lectures-j.html 2, {H} {T } S = {H, T } {(H, H), (H, T )} {(H, T ), (T, T )} {(H, H), (T, T )} {1
( )/2 http://www2.math.kyushu-u.ac.jp/ hara/lectures/lectures-j.html 1 2011 ( )/2 2 2011 4 1 2 1.1 1 2 1 2 3 4 5 1.1.1 sample space S S = {H, T } H T T H S = {(H, H), (H, T ), (T, H), (T, T )} (T, H) S
More information68 A mm 1/10 A. (a) (b) A.: (a) A.3 A.4 1 1
67 A Section A.1 0 1 0 1 Balmer 7 9 1 0.1 0.01 1 9 3 10:09 6 A.1: A.1 1 10 9 68 A 10 9 10 9 1 10 9 10 1 mm 1/10 A. (a) (b) A.: (a) A.3 A.4 1 1 A.1. 69 5 1 10 15 3 40 0 0 ¾ ¾ É f Á ½ j 30 A.3: A.4: 1/10
More informationuntitled
( œ ) œ 138,800 17 171,000 60,000 16,000 252,500 405,400 24,000 22 95,800 24 46,000 16,000 16,000 273,000 19,000 10,300 57,800 1,118,408,500 1,118,299,000 109,500 102,821,836 75,895,167 244,622 3,725,214
More information5 Armitage x 1,, x n y i = 10x i + 3 y i = log x i {x i } {y i } 1.2 n i i x ij i j y ij, z ij i j 2 1 y = a x + b ( cm) x ij (i j )
5 Armitage. x,, x n y i = 0x i + 3 y i = log x i x i y i.2 n i i x ij i j y ij, z ij i j 2 y = a x + b 2 2. ( cm) x ij (i j ) (i) x, x 2 σ 2 x,, σ 2 x,2 σ x,, σ x,2 t t x * (ii) (i) m y ij = x ij /00 y
More information...J......1803.QX
5 7 9 11 13 15 17 19 21 23 45-1111 48-2314 1 I II 100,000 80,000 60,000 40,000 20,000 0 272,437 80,348 82,207 81,393 82,293 83,696 84,028 82,232 248,983 80,411 4,615 4,757 248,434 248,688 76,708 6,299
More information2 G(k) e ikx = (ik) n x n n! n=0 (k ) ( ) X n = ( i) n n k n G(k) k=0 F (k) ln G(k) = ln e ikx n κ n F (k) = F (k) (ik) n n= n! κ n κ n = ( i) n n k n
. X {x, x 2, x 3,... x n } X X {, 2, 3, 4, 5, 6} X x i P i. 0 P i 2. n P i = 3. P (i ω) = i ω P i P 3 {x, x 2, x 3,... x n } ω P i = 6 X f(x) f(x) X n n f(x i )P i n x n i P i X n 2 G(k) e ikx = (ik) n
More informationn ξ n,i, i = 1,, n S n ξ n,i n 0 R 1,.. σ 1 σ i .10.14.15 0 1 0 1 1 3.14 3.18 3.19 3.14 3.14,. ii 1 1 1.1..................................... 1 1............................... 3 1.3.........................
More informationsec13.dvi
13 13.1 O r F R = m d 2 r dt 2 m r m = F = m r M M d2 R dt 2 = m d 2 r dt 2 = F = F (13.1) F O L = r p = m r ṙ dl dt = m ṙ ṙ + m r r = r (m r ) = r F N. (13.2) N N = R F 13.2 O ˆn ω L O r u u = ω r 1 1:
More information量子力学 問題
3 : 203 : 0. H = 0 0 2 6 0 () = 6, 2 = 2, 3 = 3 3 H 6 2 3 ϵ,2,3 (2) ψ = (, 2, 3 ) ψ Hψ H (3) P i = i i P P 2 = P 2 P 3 = P 3 P = O, P 2 i = P i (4) P + P 2 + P 3 = E 3 (5) i ϵ ip i H 0 0 (6) R = 0 0 [H,
More information( ) ( ) HPC SPH FPGA Web http://galaxy.u-aizu.ac.jp/trac/note/ : 1 4 : 2 6 : 3 6 GPU : ~ 100 1000 : ~ 1000-100000 Google : ~ 10000 : ~ 100000000 GPU, Cell, FPGA GRAPE-DR/GRAPE-MP ( ) GPU GPU : Matsumoto,
More informationÊÂÎó·×»»¤È¤Ï/OpenMP¤Î½éÊâ¡Ê£±¡Ë
2015 5 21 OpenMP Hello World Do (omp do) Fortran (omp workshare) CPU Richardson s Forecast Factory 64,000 L.F. Richardson, Weather Prediction by Numerical Process, Cambridge, University Press (1922) Drawing
More informationスライド 1
計算科学が拓く世界スーパーコンピュータは何故スーパーか 学術情報メディアセンター中島浩 http://www.para.media.kyoto-u.ac.jp/jp/ username=super password=computer 講義の概要 目的 計算科学に不可欠の道具スーパーコンピュータが どういうものか なぜスーパーなのか どう使うとスーパーなのかについて雰囲気をつかむ 内容 スーパーコンピュータの歴史を概観しつつ
More informationiii iv v vi 21 A B A B C C 1 1 2 3 4 5 6 1 2 3 4 5 6 7 8 9 10 11 19 22 30 39 43 48 54 60 65 74 77 84 87 89 95 101 12 20 23 31 40 44 49 55 61 66 75 78 85 88 90 96 102 13 21 24 32 41 45 50 56 62 67 76 79
More information26102 (1/2) LSISoC: (1) (*) (*) GPU SIMD MIMD FPGA DES, AES (2/2) (2) FPGA(8bit) (ISS: Instruction Set Simulator) (3) (4) LSI ECU110100ECU1 ECU ECU ECU ECU FPGA ECU main() { int i, j, k for { } 1 GP-GPU
More informationHPC146
2 3 4 5 6 int array[16]; #pragma xmp nodes p(4) #pragma xmp template t(0:15) #pragma xmp distribute t(block) on p #pragma xmp align array[i] with t(i) array[16] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Node
More information表1票4.qx4
iii iv v 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 22 23 10 11 24 25 26 27 10 56 28 11 29 30 12 13 14 15 16 17 18 19 2010 2111 22 23 2412 2513 14 31 17 32 18 33 19 34 20 35 21 36 24 37 25 38 2614
More informationストリーミング SIMD 拡張命令2 (SSE2) を使用した、倍精度浮動小数点ベクトルの最大/最小要素とそのインデックスの検出
SIMD 2(SSE2) / 2.0 2000 7 : 248602J-001 01/10/30 1 305-8603 115 Fax: 0120-47-8832 * Copyright Intel Corporation 1999-2001 01/10/30 2 1...5 2...5 2.1...5 2.1.1...5 2.1.2...8 3...9 3.1...9 3.2...9 4...9
More informationスライド 1
東北大学工学部機械知能 航空工学科 2019 年度クラス C D 情報科学基礎 I 14. さらに勉強するために 大学院情報科学研究科 鏡慎吾 http://www.ic.is.tohoku.ac.jp/~swk/lecture/ 0 と 1 の世界 これまで何を学んだか 2 進数, 算術演算, 論理演算 計算機はどのように動くのか プロセッサとメモリ 演算命令, ロード ストア命令, 分岐命令 計算機はどのように構成されているのか
More informationVLSI工学
2008//5/ () 2008//5/ () 2 () http://ssc.pe.titech.ac.jp 2008//5/ () 3!! A (WCDMA/GSM) DD DoCoMo 905iP905i 2008//5/ () 4 minisd P900i SemiConsult SDRAM, MPEG4 UIMIrDA LCD/ AF ADC/DAC IC CCD C-CPUA-CPU DSPSRAM
More informationGPUコンピューティング講習会パート1
GPU コンピューティング (CUDA) 講習会 GPU と GPU を用いた計算の概要 丸山直也 スケジュール 13:20-13:50 GPU を用いた計算の概要 担当丸山 13:50-14:30 GPU コンピューティングによる HPC アプリケーションの高速化の事例紹介 担当青木 14:30-14:40 休憩 14:40-17:00 CUDA プログラミングの基礎 担当丸山 TSUBAME の
More information1 (bit ) ( ) PC WS CPU IEEE754 standard ( 24bit) ( 53bit)
GNU MP BNCpack tkouya@cs.sist.ac.jp 2002 9 20 ( ) Linux Conference 2002 1 1 (bit ) ( ) PC WS CPU IEEE754 standard ( 24bit) ( 53bit) 10 2 2 3 4 5768:9:; = %? @BADCEGFH-I:JLKNMNOQP R )TSVU!" # %$ & " #
More informationエクセルカバー入稿用.indd
i 1 1 2 3 5 5 6 7 7 8 9 9 10 11 11 11 12 2 13 13 14 15 15 16 17 17 ii CONTENTS 18 18 21 22 22 24 25 26 27 27 28 29 30 31 32 36 37 40 40 42 43 44 44 46 47 48 iii 48 50 51 52 54 55 59 61 62 64 65 66 67 68
More informationuntitled
2005 2 1 105-0004 5-34-3 Tel: 03-3431-4002 Fax: 03-3431-4044 1 SRL/ISTEC 1 1 SFQ SFQ SFQ 2004 9 4 SFQ SFQ / LSI 269 230 230 230 269 230 SFQ SFQ 2005 2 ISTEC 2005 All rights reserved. - 1 - 2005 2 1 105-0004
More informationスライド 1
演算精度に応じた高性能計算を実現するコンパイラの提案と実装 会津大学中里直人 概要 No.2 問題設定 アクセラレータの紹介 問題特化型のコンパイラ 性能評価 GRAPE-DRでの性能評価 RV770での性能評価 他の応用例 発展のアイデア Grand Challenge problems No.3 Grand Challenge problems No.4 Simulations with very
More information( š ) š 13,448 1,243,000 1,249,050 1,243,000 1,243,000 1,249,050 1,249, , , ,885
( š ) 7,000,000 191 191 6,697,131 5,845,828 653,450 197,853 4,787,707 577,127 4,000,000 146,580 146,580 64,000 100,000 500,000 120,000 60,000 60,000 60,000 60,000 60,000 200,000 150,000 60,000 60,000 100,000
More information2
GPU 2008/11/30 GPU GPU UniformGrid GPU CPU GeForce6 9 kd-tree GPU GPU UG kd-tree GPU CPU GPU GPU GPU I/O PCI-Express DMA DirectX9 DirectX 3D OpenGL CUDA Larrabee Mac 2008/11/28 Mac(Carbon) Carbon.framework/QuickTime.framework
More information2
( ) 1 2 3 1.CPU, 2.,,,,,, 3. register, register, 4.L1, L2, (L3), (L4) 4 register L1 cache L2 cache Main Memory,, L2, L1 CPU L2, L1, CPU 5 , 6 dgem2vu 7 ? Wiedemann algorithm u 0, w 0, s i, s i = u 0 Ai
More information01_.g.r..
I II III IV V VI VII VIII IX X XI I II III IV V I I I II II II I I YS-1 I YS-2 I YS-3 I YS-4 I YS-5 I YS-6 I YS-7 II II YS-1 II YS-2 II YS-3 II YS-4 II YS-5 II YS-6 II YS-7 III III YS-1 III YS-2
More informationNEC All rights reserved 1
NEC All rights reserved 1 NEC All rights reserved 2 NEC All rights reserved 3 (Founder) (Langchao Langchao) NEC All rights reserved 4 2.1 GB/s 64 bits wide 266 MHz 4 MB L3 on board, 96k L2, 32k L1 on -die
More informationt = h x z z = h z = t (x, z) (v x (x, z, t), v z (x, z, t)) ρ v x x + v z z = 0 (1) 2-2. (v x, v z ) φ(x, z, t) v x = φ x, v z
I 1 m 2 l k 2 x = 0 x 1 x 1 2 x 2 g x x 2 x 1 m k m 1-1. L x 1, x 2, ẋ 1, ẋ 2 ẋ 1 x = 0 1-2. 2 Q = x 1 + x 2 2 q = x 2 x 1 l L Q, q, Q, q M = 2m µ = m 2 1-3. Q q 1-4. 2 x 2 = h 1 x 1 t = 0 2 1 t x 1 (t)
More information