2. Amazon GPU Cluster Compute Instance Amazon CCI Amazon EC2 CCI GPU Cluster GPU Quadruple Extra Large Instance (cg1.4xlarge) [6] On Demand Inhouse In

Similar documents
HPC (pay-as-you-go) HPC Web 2

1 GPU GPGPU GPU CPU 2 GPU 2007 NVIDIA GPGPU CUDA[3] GPGPU CUDA GPGPU CUDA GPGPU GPU GPU GPU Graphics Processing Unit LSI LSI CPU ( ) DRAM GPU LSI GPU

untitled

23 Fig. 2: hwmodulev2 3. Reconfigurable HPC 3.1 hw/sw hw/sw hw/sw FPGA PC FPGA PC FPGA HPC FPGA FPGA hw/sw hw/sw hw- Module FPGA hwmodule hw/sw FPGA h

MATLAB® における並列・分散コンピューティング ~ Parallel Computing Toolbox™ & MATLAB Distributed Computing Server™ ~

untitled

GPU n Graphics Processing Unit CG CAD

PC Development of Distributed PC Grid System,,,, Junji Umemoto, Hiroyuki Ebara, Katsumi Onishi, Hiroaki Morikawa, and Bunryu U PC WAN PC PC WAN PC 1 P

IPSJ SIG Technical Report Vol.2013-ARC-207 No.23 Vol.2013-HPC-142 No /12/17 1,a) 1,b) 1,c) 1,d) OpenFOAM OpenFOAM A Bottleneck and Cooperation

GPGPU

HP High Performance Computing(HPC)

1重谷.PDF

[4] ACP (Advanced Communication Primitives) [1] ACP ACP [2] ACP Tofu UDP [3] HPC InfiniBand InfiniBand ACP 2 ACP, 3 InfiniBand ACP 4 5 ACP 2. ACP ACP

07-二村幸孝・出口大輔.indd

! 行行 CPUDSP PPESPECell/B.E. CPUGPU 行行 SIMD [SSE, AltiVec] 用 HPC CPUDSP PPESPE (Cell/B.E.) SPE CPUGPU GPU CPU DSP DSP PPE SPE SPE CPU DSP SPE 2

IPSJ SIG Technical Report Vol.2014-ARC-213 No.24 Vol.2014-HPC-147 No /12/10 GPU 1,a) 1,b) 1,c) 1,d) GPU GPU Structure Of Array Array Of

09中西

Input image Initialize variables Loop for period of oscillation Update height map Make shade image Change property of image Output image Change time L

211 年ハイパフォーマンスコンピューティングと計算科学シンポジウム Computing Symposium 211 HPCS /1/18 a a 1 a 2 a 3 a a GPU Graphics Processing Unit GPU CPU GPU GPGPU G

untitled

IPSJ SIG Technical Report Vol.2013-HPC-138 No /2/21 GPU CRS 1,a) 2,b) SpMV GPU CRS SpMV GPU NVIDIA Kepler CUDA5.0 Fermi GPU Kepler Kepler Tesla

IPSJ SIG Technical Report Vol.2011-IOT-12 No /3/ , 6 Construction and Operation of Large Scale Web Contents Distribution Platfo

1 OpenCL OpenCL 1 OpenCL GPU ( ) 1 OpenCL Compute Units Elements OpenCL OpenCL SPMD (Single-Program, Multiple-Data) SPMD OpenCL work-item work-group N

マルチコアPCクラスタ環境におけるBDD法のハイブリッド並列実装

IPSJ SIG Technical Report Vol.2014-CG-155 No /6/28 1,a) 1,2,3 1 3,4 CG An Interpolation Method of Different Flow Fields using Polar Inter

Leveraging Cloud Computing to launch Python apps

Second-semi.PDF

untitled

IPSJ SIG Technical Report Vol.2013-ARC-203 No /2/1 SMYLE OpenCL (NEDO) IT FPGA SMYLEref SMYLE OpenCL SMYLE OpenCL FPGA 1

Amazon EC2 IaaS (Infrastructure as a Service) HPCI HPCI ( VM) VM VM HPCI VM OS VM HPCI HPC HPCI RENKEI-PoP 2 HPCI HPCI 1 HPCI HPCI HPC CS

CPU Levels in the memory hierarchy Level 1 Level 2... Increasing distance from the CPU in access time Level n Size of the memory at each level 1: 2.2

A Feasibility Study of Direct-Mapping-Type Parallel Processing Method to Solve Linear Equations in Load Flow Calculations Hiroaki Inayoshi, Non-member

., White-Box, White-Box. White-Box.,, White-Box., Maple [11], 2. 1, QE, QE, 1 Redlog [7], QEPCAD [9], SyNRAC [8] 3 QE., 2 Brown White-Box. 3 White-Box

Krylov A04 October 8, 2010 T. Sakurai (Univ. Tsukuba) Krylov October 8, / 48

第62巻 第1号 平成24年4月/石こうを用いた木材ペレット

スライド 1

HPC可視化_小野2.pptx

2. Eades 1) Kamada-Kawai 7) Fruchterman 2) 6) ACE 8) HDE 9) Kruskal MDS 13) 11) Kruskal AGI Active Graph Interface 3) Kruskal 5) Kruskal 4) 3. Kruskal

Run-Based Trieから構成される 決定木の枝刈り法


IDRstab(s, L) GBiCGSTAB(s, L) 2. AC-GBiCGSTAB(s, L) Ax = b (1) A R n n x R n b R n 2.1 IDR s L r k+1 r k+1 = b Ax k+1 IDR(s) r k+1 = (I ω k A)(r k dr


untitled

卒業論文

情報処理学会研究報告 IPSJ SIG Technical Report Vol.2013-HPC-139 No /5/29 Gfarm/Pwrake NICT NICT 10TB 100TB CPU I/O HPC I/O NICT Gf

1_26.dvi

DO 時間積分 START 反変速度の計算 contravariant_velocity 移流項の計算 advection_adams_bashforth_2nd DO implicit loop( 陰解法 ) 速度勾配, 温度勾配の計算 gradient_cell_center_surface 速

4. C i k = 2 k-means C 1 i, C 2 i 5. C i x i p [ f(θ i ; x) = (2π) p 2 Vi 1 2 exp (x µ ] i) t V 1 i (x µ i ) 2 BIC BIC = 2 log L( ˆθ i ; x i C i ) + q

HPEハイパフォーマンスコンピューティング ソリューション

― ANSYS Mechanical ―Distributed ANSYS(領域分割法)ベンチマーク測定結果要約

2012年度HPCサマーセミナー_多田野.pptx

XACCの概要

2). 3) 4) 1.2 NICTNICT DCRA Dihedral Corner Reflector micro-arraysdcra DCRA DCRA DCRA 3D DCRA PC USB PC PC ON / OFF Velleman K8055 K8055 K8055

B 2 Thin Q=3 0 0 P= N ( )P Q = 2 3 ( )6 N N TSUB- Hub PCI-Express (PCIe) Gen 2 x8 AME1 5) 3 GPU Socket 0 High-performance Linpack 1

Microsoft PowerPoint - CCS学際共同boku-08b.ppt

main.dvi

tnbp59-21_Web:P2/ky132379509610002944

IPSJ SIG Technical Report Vol.2012-ARC-202 No.13 Vol.2012-HPC-137 No /12/13 Tightly Coupled Accelerators 1,a) 1,b) 1,c) 1,d) GPU HA-PACS

TSUBAME2.0 における GPU の 活用方法 東京工業大学学術国際情報センター丸山直也第 10 回 GPU コンピューティング講習会 2011 年 9 月 28 日

Vol.214-HPC-145 No /7/3 C #pragma acc directive-name [clause [[,] clause] ] new-line structured block Fortran!$acc directive-name [clause [[,] c

研修コーナー

パーキンソン病治療ガイドライン2002

SQUFOF NTT Shanks SQUFOF SQUFOF Pentium III Pentium 4 SQUFOF 2.03 (Pentium 4 2.0GHz Willamette) N UBASIC 50 / 200 [

VMware VirtualCenter: Virtual Infrastructure Management Software

VXPRO R1400® ご提案資料

Fig. 3 3 Types considered when detecting pattern violations 9)12) 8)9) 2 5 methodx close C Java C Java 3 Java 1 JDT Core 7) ) S P S

Dual Stack Virtual Network Dual Stack Network RS DC Real Network 一般端末 GN NTM 端末 C NTM 端末 B IPv4 Private Network IPv4 Global Network NTM 端末 A NTM 端末 B

HP Workstation Xeon 5600

template.dvi

Microsoft Word ●MPI性能検証_志田_ _更新__ doc

Wikipedia YahooQA MAD 4)5) MAD Web 6) 3. YAMAHA 7) 8) Vocaloid PV YouTube 1 minato minato ussy 3D MAD F EDis ussy

PowerPoint Presentation

パナソニック技報

Shonan Institute of Technology MEMOIRS OF SHONAN INSTITUTE OF TECHNOLOGY Vol. 41, No. 1, 2007 Ships1 * ** ** ** Development of a Small-Mid Range Paral

7 OpenFOAM 6) OpenFOAM (Fujitsu PRIMERGY BX9, TFLOPS) Fluent 8) ( ) 9, 1) 11 13) OpenFOAM - realizable k-ε 1) Launder-Gibson 15) OpenFOAM 1.6 CFD ( )

1 3DCG [2] 3DCG CG 3DCG [3] 3DCG 3 3 API 2 3DCG 3 (1) Saito [4] (a) 1920x1080 (b) 1280x720 (c) 640x360 (d) 320x G-Buffer Decaudin[5] G-Buffer D

DEIM Forum 2012 C2-6 Hadoop Web Hadoop Distributed File System Hadoop I/O I/O Hadoo

(11-5) Abstract : An ultrasonic air pump utilizing acoustic streaming is discussed and its efficient simulation method using finite element analysis (

untitled

GPU GPU CPU CPU CPU GPU GPU N N CPU ( ) 1 GPU CPU GPU 2D 3D CPU GPU GPU GPGPU GPGPU 2 nvidia GPU CUDA 3 GPU 3.1 GPU Core 1


RaVioli SIMD

Cell/B.E. BlockLib

SWoPP BOF BOF-1 8/3 19:10 BoF SWoPP : BOF-2 8/5 17:00 19:00 HW/SW 15 x5 SimMips/MieruPC M-Core/SimMc FPGA S

ProLiant BL460c システム構成図

情報処理学会研究報告 IPSJ SIG Technical Report Vol.2014-HPC-144 No /5/ CRS 2 CRS Performance evaluation of exclusive version of preconditioned ite

名称 : 日本 GPU コンピューティングパートナーシップ (G-DEP) 所在 : 東京都文京区本郷 7 丁目 3 番 1 号東京大学アントレプレナープラザ, 他工場 URL アライアンスパートナー コアテクノロジーパートナー NVIDIA JAPAN ソリュ

Vol.57 No (Mar. 2016) 1,a) , L3 CG VDI VDI A Migration to a Cloud-based Information Infrastructure to Support

01_OpenMP_osx.indd

rank ”«‘‚“™z‡Ì GPU ‡É‡æ‡éŁÀŠñ›»

,,.,,., II,,,.,,.,.,,,.,,,.,, II i

大規模共有メモリーシステムでのGAMESSの利点

IPSJ SIG Technical Report 1, Instrument Separation in Reverberant Environments Using Crystal Microphone Arrays Nobutaka ITO, 1, 2 Yu KITANO, 1

,4) 1 P% P%P=2.5 5%!%! (1) = (2) l l Figure 1 A compilation flow of the proposing sampling based architecture simulation

HBase Phoenix API Mars GPU MapReduce GPU Hadoop Hadoop Hadoop MapReduce : (1) MapReduce (2)JobTracker 1 Hadoop CPU GPU Fig. 1 The overview of CPU-GPU

(a) (b) (c) Canny (d) 1 ( x α, y α ) 3 (x α, y α ) (a) A 2 + B 2 + C 2 + D 2 + E 2 + F 2 = 1 (3) u ξ α u (A, B, C, D, E, F ) (4) ξ α (x 2 α, 2x α y α,

2. CABAC CABAC CABAC 1 1 CABAC Figure 1 Overview of CABAC 2 DCT 2 0/ /1 CABAC [3] 3. 2 値化部 コンテキスト計算部 2 値算術符号化部 CABAC CABAC

& Vol.5 No (Oct. 2015) TV 1,2,a) , Augmented TV TV AR Augmented Reality 3DCG TV Estimation of TV Screen Position and Ro

Microsoft Azure

Microsoft PowerPoint - GPU_computing_2013_01.pptx

Transcription:

Amazon EC2 GPU OpenFOAM 1 1,2 1,3 VM HPC HPC Amazo EC2 GPGPU OpeFOAM GPU OpenFOAM MPI GPGPU 8 EC2 GPU, Cloud, CFD Akihiko Saijo 1 Yasushi Inoguchi 1,2 Teruo Matsuzawa 1,3 1. HPC (Inhouse) IaaS (Infrastructre as a Service) PC HPC HPC HPC HPC MPI 1 Japan Advanced Institute of Technology and Science, School of Infomation 2 3 CPU HPC MPI MPI HPC HPC Amazon EC2 (Elastic Compute Cloud) EC2 HPC CCI (Cluster Compute Instance) HPC CCI NVIDIA CUDA GPU GPU CCI HPC EC2 CCI HPC Amazon EC2 GPU CCI OpenFOAM GPU CCI Inhouse GPU Amazon EC2 HPC 1

2. Amazon GPU Cluster Compute Instance Amazon CCI Amazon EC2 CCI GPU Cluster GPU Quadruple Extra Large Instance (cg1.4xlarge) [6] On Demand Inhouse Infiniband GPU Cluster (pcc-gpu) 1 3. MPI 3.1 Amaozon EC2 GPU CCI GPU CCI (Virgina) 2 GPU Amazon CCI VM VM API VM OS Cluster GPU Amazon Linux AMI 212.3 Red Hat Enterprise Linux Amazon MPI GPU HVM (Hardware Virtual Module) CUDA EC2 CCI StarCluster[7] OpenFOAM EC2 VM Cloud-Flu[8] GPU GCC OpenFOAM OpenMPI Xeon HyperThreading OS NFS EBS(Elastic Block Store) Open- FOAM I/O 3.2 MPI Intel MPI Benchmarks (IMB) MPI (Ghost Cell) 2 IMB PingPong 1 CCI (cg1.4xlarge) Inhouse (pcc-gpu) CCI 6 8bytes 2 CCI 1 EC2 Elapsed time [μsec] 1 Elapsed time [μsec] 2 9 8 7 6 5 4 3 2 1 IMB PingPong (2nodes) cg1.4xlarge pcc-gpu 5 1 15 2 25 3 Message size [byte] 2 : EC2 CCI vs. Inhouse Cluster 35 3 25 2 15 1 5 IMB Allreduce (8bytes) cg1.4xlarge pcc-gpu 1 2 3 4 5 6 7 8 9 : EC2 CCI vs. Inhouse Cluster 2

1 EC2 GPU Cluster Instance Inhouse GPU Cluster Table 1 Specifications cg1.4xlarge pcc-gpu CPU Intel Xeon X557 2.93 GHz AMD Opteron 6136 2.4 GHz CPU () 2(8 w/o HyperThreading) 2(16) 22 GB 32 GB GPUs NVIDIA Tesla M25 2 1 Gigabit Ethernet Infiniband QDR OS Cluster GPU Amazon Linux AMI 212.3 CentOS 6.2 GNU GCC 4.4.6 Options: -O2 -fpic CUDA Version NVIDIA CUDA 4.2 CUDA 4.1 MPI Library Open MPI 1.5.3 MVAPICH2 1.7 4. OpenFOAM GPU OpenFOAM SIMPLE (Semi-Implicit Methods Pressure-Linked Equations) () Navier-Stokes { (ρu) =, (U ) U (ν U) = P (1) OpenFOAM SIMPLE 2 (FVM) [3]. (Node) p a p U p = H(U) P U p = H(U) P, (2) a p a p where H(U) = a n U n. n NEIGH(p) a p U p H(U) p U U = SU f (3) f F ACE S FVM U f f ( ) H(U) U f = ( P ) f (4) a p f (a p ) f Algorithm 1 SIMPLE 1: 2: repeat 3: 4: 5: PCG 6: 7: 8: 9: until ( ) 1 P = a p = f ( ) H(U) a p ( ) H(U) S a p f (5) A.x = b (x [P 1, P 2,..., P N ], b ) A CG SIMPLE [4] CG 4.1 GPU CG 2? p PCG (SpMV) GPU SpMV Li Saad 3

Algorithm 2 Parallel Preconditioned Conjuagte Gradient 1: Given x. 2: Let p = b Ax, z = M 1 r, r = p, k =. 3: repeat 4: MPI Send GHOST CELLS of p k. 5: q k = Ap k 6: MPI Recv GHOST CELLS of q k. 7: α k = p T k r k/p T k q k 8: MPI Allreduce SUM α k. 9: x k+1 = x k + α k p k 1: r k+1 = r k α k q k 11: z k+1 = M 1 r k+1 12: β k = r T k+1 q k/p T k q k 13: MPI Allreduce SUM β k. 14: p k+1 = r k+1 + β k p k 15: k = k + 1 16: until ( r k+1 / r ɛ) CUDA ITSOL[2] SpMV JAD (JAgged Diagonal) SIMPLE A OpenFOAM JAD M (5) SIMPLE JAD 2 4.1.1 AMG 2 GPU (Algebraic MultiGrid) CUDA AMG CUSP [9] smoothed aggregation AMG float 1 2 GPU P2P 1 1MPI MPI 4.1.2 OpenFOAM MPI MPI GPU MPI GPU CPU PCIe MPI MPI SpMV CUDA Stream 4.2 (Thoracic Aorta) MRI ANSYS Gambit Open- FOAM SMALL, MEDIUM, LARGE 3 SMALL 4 2 3 OpenFOAM Scotch [1] 4 4 3 4 4

2 Table 2 Meshes SMALL MEDIUM LARGE 1,912,272 2,98,32 5,144,73 3,874,336 6,31,26 1,382,979 [MB] 155 311 543 3 Table 3 Simulation parameters simplefoam (OpenFOAM-2.1.1) ν = 3.33 1 6 [Pa.s]( ) V =.263 [m/s] (Re = 3) P = [Pa].6 δp 1 1. 1 6 and δv 1 1. 1 6 GPU-AMG-CG ILU-BiCG r 1 1. 1 8 Elpased time [sec] EC2 vs. Inhouse: AMG-CG 1 inner loop cg1.4xlarge (CPU) cg1.4xlarge pcc-gpu.6.5.4.3.2.1 1 2 4 8 6 GPU CCI Inhouse Cluster AMG-CG : LAREG EC2 vs. Inhouse: SIMPLE outer loop cg1.4xlarge (CPU) cg1.4xlarge pcc-gpu 4.3 GPU CCI 1 ( 5) LARGE ( 6, 7) 1 CPU DICCG GPU CCI Inhouse Cluster CPU GPU CCI GPU GPU CCI CPU Inhouse Cluster CPU GPU CPU 4 6 MPI 8 CPU 9 8 EC2 Elapsed time [sec] EC2 vs. Inhouse: AMG-PCG inner loop cg1.4xlarge (CPU-DIC) pcc-gpu (CPU-DIC) cg1.4xlarge (GPU-AMG) pcc-gpu (GPU-AMG).7.6.5.4.3.2.1 SMALL MEDIUM LARGE 5 CCI Inhouse Cluster CPU DIC-CG GPU AMG-CG 7 Elapsed time [sec] 12 1 8 6 4 2 1 2 4 8 GPU CCI Inhouse Cluster SIMPLE : LAREG 5. EC2 HPC Zhai [11] EC2 IMB NPB 6. Amazon EC2 GPU CCI IMB GPU-AMG-CG EC2 EC2 [1] Malecha Ziemowit M, Miroslaw Lukasz, Tomczak Tadeusz, Koza Zbigniew, Matyka Maciej, Tarnawski Wojciech, Szczerba Dominik. GPU-based simulation of 3D blood flow in abdominal aorta using OpenFoam. Archives of Mechanics, 211, vol. 63, No 2, pp. 137-161 [2] R.Li, Y.Saad. GPU-accelerated preconditioned iterative linear solvers, Report umsi-21-112, Minnesota Supercomputer Institute, University of Minnesota, Minneapo- 5

lis, MN, 21. [3] The SIMPLE algorithm in Open- FOAM - OpenFOAMWiki, http://openfoamwiki.net/index.php/ The SIMPLE algorithm in OpenFOAM [4] J.H.Ferziger, M.Peric. Computational Methods for Fluid Dynamics. Springer-Verlag Berling, Heidelberg, 1996. [5] Y.Saad. Iterative Methods for Sparse Linear Systems. PWS Publishing Co.,Massachusetts, MA, 2. [6] Amazon: EC2 Instance Type (online): https://aws.amazon.com/ec2/instance-types/ [7] Star: Cluster http://web.mit.edu/star/cluster/ [8] Alexey Petrov, Andrey Simurzin Cloud Flu. http://sourceforge.net/apps/mediawiki/cloudflu/ index.php?title=main Page [9] Nathan Bell and Michael Garland. Cusp: Generic Parallel Algorithms for Sparse Matrix and Graph Computations, 212. http://cusplibrary.googlecode.com [1] SCOTCH: A Software Package for Static Mapping by Dual Recursive Bipartitioning of Process and Architecture Graphs. Proceedings of HPCN 96, Brussels, Belgium. LNCS 167, pages 493-498. Springer, April 1996. F. Pellegrini and J. Roman. www.labri.fr/perso/pelegrin/scotch/ [11] Yan Zhai, Mingliang Liu, Jidong Zhai, Xiaosong Ma, and Wenguang Chen. Cloud versus in-house cluster: evaluating Amazon cluster compute instances for running MPI applications. In State of the Practice Reports (SC 11). ACM, New York, NY, USA, Article 11,1 pages. 211 6