HPC (pay-as-you-go) HPC Web 2

Size: px

Start display at page:

Download "HPC (pay-as-you-go) HPC Web 2"

ちえこかせ
5 years ago
Views:

1 ,, 1

2 HPC (pay-as-you-go) HPC Web 2

3 HPC Amazon EC2 OpenFOAM GPU EC2 3

4 HPC MPI MPI Courant 1 GPGPU MPI 4

5 AMAZON EC2 GPU CLUSTER COMPUTE INSTANCE EC2 GPU (cg1.4xlarge) ( N. Virgina ) Quadcore Intel Xeon Ghz x2 (8cores) 22GB Memory NVIDIA M2050 (2687MB) x 2 10 GbEtherNet Amazon Linux AMI (RHEL base) $2.10 /hour / node 5

6 EC2 Youtube : Building a Cluster in Less Than Ten Minutes sudo CUDA SDK, OpenFOAM, GPU, Machine Image Web $ 0.10 / GB / month 6

7 EC2 WEB CONSOLE 7

8 PCC-GPU: APPRO GPU CLUSTER (in-house) GPU (pcc-gpu) Octocore AMD Opteron 2.4 GHz x 2 (16 cores) 32 GB Memory NVIDIA M2050 (2687MB) x 2 Infiniband QDR CentOS ( 9 ) 8

9 Intel MPI Benchmarks (IMB) OpenFOAM 2 : PingPong 2 : Allreduce (MPI_SUM, 8bytes) 9

10 IMB: PINGPONG (2 NODES) IMB PingPong (2nodes) cg1.4xlarge pcc-gpu Elapsed time [μsec] Message size [byte] [Kbyte] 10

11 IMB: ALLREDUCE (SUM, 8BYTES) IMB Allreduce (8bytes) cg1.4xlarge pcc-gpu Elapsed time [μsec] Number of nodes 11

12 NS { (ρu) =0, (U ) U (ν U) = P ( ) H(U) U f = ( P ) f a p f (a p ) f ( ) 1 P a p = = f ( ) H(U) S a p ( ) H(U) a p f NS p f ap P Uf Poisson 12

13 SIMPLE Algorithm 1 SIMPLE 1: 2: repeat 3: 4: 5: PCG 6: 7: 8: 9: until 13

14 PRECONDITIONED CG 3 MPI - CUBLAS SpMV (sparse Matrix Vector) - CUDA ITSOL (Li and Saad, 2012) JAD MPI : - CUDA ITSOL, NVIDIA CUSP 14

15 GPU CUDA ITSOL (Li and Saad, 2011) CUDA JAD SpMV (Sparse Matrix Vector product) GPU NVIDIA CUSP: AMG MPI OpenFOAM 15

16 JAD: SPARSE MATRIX STORAGE Compressed Row Storage Wavefront ordering JAgged Diagonal storage Wavefront ordering JAD : CUDA 16

17 JAD SPMV CPU-CSR GPU-CSR GPU-JAD Gflop/s bones01 parabolic_fem thermal , ,825 1,228,045 6,715,152 3,674,625 8,580,313 17

18 OpenFOAM GPU JAD JAD 1 18

19 SPMV: MPI (Ghost cell) CPU (D2H) MPI GPU (H2D) SpMV CUDA MPI Device2Host Host2Device SpMV 19

20 AMG (Algebraic MultiGrid Preconditioner) NVIDIA CUSP LIBRARY smoothed_aggregation 20

21 : 21

22 MRI Gambit OF OF 22

23 23

24 simplefoam (OpenFOAM-2.1.1) ν = [m 2 /s]( ) V =0.461 [m/s] (Re = 6500) P = 76 [Pa], P = 0 [Pa] 0.6 δp and δv GPU-AMG-CG ILU-BiCG r

25 1778SIMPLE 25

26 SMALL MEDIUM LARGE 1,912,272 2,980,302 5,144, MB 311MB 543MB 26

27 EC2 CPU-ICCG EC2 GPU-AMGCG JAIST GPU Cluster GPU-AMGCG 27

28 CG EC2 vs. Inhouse: AMG-PCG inner loop cg1.4xlarge (CPU-DIC) cg1.4xlarge (GPU-AMG) pcc-gpu (CPU-DIC) pcc-gpu (GPU-AMG) Elapsed time [sec] SMALL MEDIUM LARGE Number of nodes 28

29 PCG EC2 vs. In-house: CG LOOP (LARGE) cg1.4xlarge (ICCG) pcc-gpu (AMGCG) cg1.4xlarge (AMGCG) Elapsed time [sec] Number of Nodes 29

30 SIMPLE (LARGE) EC2 vs. Inhouse: SIMPLE LOOP pcc-gpu cg1.4xlarge Elapsed time [sec] Number of Threads 30

31 r r ICCG AMG-CG

32 EC2 OpenFOAM CUDA ITSOL NVIDIA CUSP GPU 8 32

2. Amazon GPU Cluster Compute Instance Amazon CCI Amazon EC2 CCI GPU Cluster GPU Quadruple Extra Large Instance (cg1.4xlarge) [6] On Demand Inhouse In

2. Amazon GPU Cluster Compute Instance Amazon CCI Amazon EC2 CCI GPU Cluster GPU Quadruple Extra Large Instance (cg1.4xlarge) [6] On Demand Inhouse In Amazon EC2 GPU OpenFOAM 1 1,2 1,3 VM HPC HPC Amazo EC2 GPGPU OpeFOAM GPU OpenFOAM MPI GPGPU 8 EC2 GPU, Cloud, CFD Akihiko Saijo 1 Yasushi Inoguchi 1,2 Teruo Matsuzawa 1,3 1. HPC (Inhouse) IaaS (Infrastructre