Slide 1

Similar documents
ビジュアルコンピューティングテクノロジの世界的リーダー 本社所在地 創業年 創業者 販売商品 社員数 売上高 カリフォルニア州サンタクララ 1993 年 Jen-Hsun Huang グラフィックスソリューション約 5,700 人 40 億ドル

Introduction Purpose This training course describes the configuration and session features of the High-performance Embedded Workshop (HEW), a key tool

PassMark PerformanceTest ™

GPU n Graphics Processing Unit CG CAD

Introduction Purpose This training course demonstrates the use of the High-performance Embedded Workshop (HEW), a key tool for developing software for

2

2

Slides: TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments

Catalog_Quadro_Series_ のコピー2

23 Fig. 2: hwmodulev2 3. Reconfigurable HPC 3.1 hw/sw hw/sw hw/sw FPGA PC FPGA PC FPGA HPC FPGA FPGA hw/sw hw/sw hw- Module FPGA hwmodule hw/sw FPGA h

19_22_26R9000操作編ブック.indb

1 OpenCL OpenCL 1 OpenCL GPU ( ) 1 OpenCL Compute Units Elements OpenCL OpenCL SPMD (Single-Program, Multiple-Data) SPMD OpenCL work-item work-group N

NEC All rights reserved 1


スライド 1

MATLAB® における並列・分散コンピューティング ~ Parallel Computing Toolbox™ & MATLAB Distributed Computing Server™ ~

2

GPGPU

HP Workstation 総合カタログ


149 (Newell [5]) Newell [5], [1], [1], [11] Li,Ryu, and Song [2], [11] Li,Ryu, and Song [2], [1] 1) 2) ( ) ( ) 3) T : 2 a : 3 a 1 :


2

システムソリューションのご紹介

Z7000操作編_本文.indb


H8000操作編

1 GPU GPGPU GPU CPU 2 GPU 2007 NVIDIA GPGPU CUDA[3] GPGPU CUDA GPGPU CUDA GPGPU GPU GPU GPU Graphics Processing Unit LSI LSI CPU ( ) DRAM GPU LSI GPU



<95DB8C9288E397C389C88A E696E6462>

Microsoft PowerPoint - GPUシンポジウム _d公開版.ppt [互換モード]

スライド 1



AMD/ATI Radeon HD 5870 GPU DEGIMA LINPACK HD 5870 GPU DEGIMA LINPACK GFlops/Watt GFlops/Watt Abstract GPU Computing has lately attracted

HP_PPT_Standard_16x9_JP

NX Nastran brochure (Japanese)

表面RTX入稿

ユーザーズマニュアル

第62巻 第1号 平成24年4月/石こうを用いた木材ペレット

catalog_quadro_series_2018

VXPRO R1400® ご提案資料

HPC (pay-as-you-go) HPC Web 2

hpc141_shirahata.pdf

HP High Performance Computing(HPC)

untitled

Dell Precision CADCG Dell Precision if 2012 if2012 T7600T5600T36003 ISV 2

DELL PRECISION T7400 T5400 T3400 M6400 M4400 M2400 R5400 FX100 February /

Development of Induction and Exhaust Systems for Third-Era Honda Formula One Engines Induction and exhaust systems determine the amount of air intake

,4) 1 P% P%P=2.5 5%!%! (1) = (2) l l Figure 1 A compilation flow of the proposing sampling based architecture simulation


IPSJ SIG Technical Report Vol.2014-ARC-213 No.24 Vol.2014-HPC-147 No /12/10 GPU 1,a) 1,b) 1,c) 1,d) GPU GPU Structure Of Array Array Of

ÿþ

catalog_quadro_2015

09中西

B 2 Thin Q=3 0 0 P= N ( )P Q = 2 3 ( )6 N N TSUB- Hub PCI-Express (PCIe) Gen 2 x8 AME1 5) 3 GPU Socket 0 High-performance Linpack 1

Cleaner XL 1.5 クイックインストールガイド

Microsoft Word - HOKUSAI_system_overview_ja.docx

HP Workstation Xeon 5600

1 Fig. 1 Extraction of motion,.,,, 4,,, 3., 1, 2. 2.,. CHLAC,. 2.1,. (256 ).,., CHLAC. CHLAC, HLAC. 2.3 (HLAC ) r,.,. HLAC. N. 2 HLAC Fig. 2

Dell Precision Workstation 6 IDC Japan,Japan Workstation Quarterly Model Analysis,Q IDCStandard WSPentiumWS M o b i l e 3D CG 3D CAD 2D CAD 3D C

新しい価値創出に貢献する大規模CAEシミュレーション


untitled

IPSJ SIG Technical Report Vol.2014-CG-155 No /6/28 1,a) 1,2,3 1 3,4 CG An Interpolation Method of Different Flow Fields using Polar Inter

第 55 回自動制御連合講演会 2012 年 11 月 17 日,18 日京都大学 1K403 ( ) Interpolation for the Gas Source Detection using the Parameter Estimation in a Sensor Network S. T

untitled

スパコンに通じる並列プログラミングの基礎

WARNING To reduce the risk of fire or electric shock,do not expose this apparatus to rain or moisture. To avoid electrical shock, do not open the cabi

2

HPC

FUJITSU Server PRIMERGY PCクラスタソリューションカタログ

IPSJ SIG Technical Report Vol.2013-ARC-207 No.23 Vol.2013-HPC-142 No /12/17 1,a) 1,b) 1,c) 1,d) OpenFOAM OpenFOAM A Bottleneck and Cooperation

HPEハイパフォーマンスコンピューティング ソリューション

倍々精度RgemmのnVidia C2050上への実装と応用

matrox0

Introduction Purpose This course explains how to use Mapview, a utility program for the Highperformance Embedded Workshop (HEW) development environmen

untitled

fx-9860G Manager PLUS_J

Images per Second Images per Second VOLTA: ディープラーニングにおける大きな飛躍 ResNet-50 トレーニング 2.4x faster ResNet-50 推論 TensorRT - 7ms レイテンシ 3.7x faster P100 V100 P10

DV-DT1 取扱説明書

2017 (413812)

HPC on Azure

Z8 G4 WorkstationでのANSYS19.1 Mechanical ベンチマーク結果紹介資料(フル版)

スパコンに通じる並列プログラミングの基礎

quadro_series_2014_catalog

untitled

4

Microsoft PowerPoint - GPU_computing_2013_01.pptx

07-二村幸孝・出口大輔.indd

VHDL-AMS Department of Electrical Engineering, Doshisha University, Tatara, Kyotanabe, Kyoto, Japan TOYOTA Motor Corporation, Susono, Shizuok

4

A Feasibility Study of Direct-Mapping-Type Parallel Processing Method to Solve Linear Equations in Load Flow Calculations Hiroaki Inayoshi, Non-member

supercomputer2010.ppt

- 1 -

PassMark PerformanceTest ™

評論・社会科学 84号(よこ)(P)/3.金子

単位、情報量、デジタルデータ、CPUと高速化 ~ICT用語集~

untitled

Transcription:

CAE 分野での GPU 活用のご紹介 エヌビディアジャパンマーケティング本部部長林憲一

NVIDIAについて 1993年に設立 設立以来 半導体企業の中で最速で 10億ドルの収益を達成 創業者 Jen-Hsun Huang 従業員 20ヵ国に約8,500人 本社 カリフォルニア州サンタクララ

CPU GPU + = スピードアップ コンパニオンプロセッサ GPU を CPU に追加することで アプリケーションが高速化 ハイパフォ - マンス コンピューティングを実現

GPU アクセラレーションの仕組み アプリケーションコード 計算量の多い部分 GPU コード全体の数 % 残りの逐次処理コード CPU +

Titan: 世界最速のオープンスーパーコンピュータ 18,688 個の Tesla K20X GPU ピーク性能 : 27 ペタフロップス ( 性能の90% はGPU) Linpack 性能 : 17.59 ペタフロップス

TSUBAME 2.0 1,408 ノード 4,224 GPU = 2,175 TFlops 2,816 CPU = 216 TFlops メモリ = 80.55 TB SSD = 173.88 TB TSUBAME 2.0 GPU 91% 0 0.5 1 1.5 2 2.5 3 HP SL390 サーバー 3x NVIDIA Tesla M2050 GPU 2x Intel Westmere-EP CPU 52 GB DDR3 メモリ 2x 60 GB SSD 2x QDR InfiniBand

アーキテクチャ 71 億トランジスタ 最大 15 SMX ユニット 1 TFLOP 以上の倍精度演算性能 1.5 MB L2 Cache 384-bit GDDR5

世界最速のアクセラレーター 世界最高効率 世界で最も普及した並列プログラミングモデル CUDA TITAN に 18,688 個搭載

TFLOPS TFLOPS 3.5 3 2.5 Single Precision FLOPS (SGEMM) 2.90 TFLOPS Tesla K20X Tesla K20 2 1.5 1 0.5 0 1.25 1.89 TFLOPS.36 TFLOPS Xeon E5-2690 Tesla M2090 Tesla K20X Double Precision FLOPS (DGEMM) 1.22 TFLOPS CUDA コア数 2688 2496 倍精度演算性能 DGEMM 単精度演算性能 SGEMM 1.32 TF 1.22 TF 3.95 TF 2.90 TF 1.17 TF 1.10 TF 3.52 TF 2.61 TF メモリバンド幅 250 GB/s 208 GB/s 0.75 0.5 0.25.18 TFLOPS.40 TFLOPS メモリサイズ 6 GB 5 GB 消費電力 235W 225W 0 Xeon E5-2690 Tesla M2090 Tesla K20X

Commercial CAE Software and GPU Progress ISV Primary Applications (Green color indicates CUDA-ready during 2013) ANSYS ANSYS Mechanical; ANSYS Fluent; ANSYS HFSS DS SIMULIA Abaqus/Standard; Abaqus/Explicit; Abaqus/CFD MSC Software Altair CD-adapco Autodesk ESI Group Siemens LSTC Mentor Metacomp MSC Nastran; Marc; Adams RADIOSS; AcuSolve STAR-CD; STAR-CCM+ AS Mechanical, Moldflow, AS CFD PAM-CRASH imp; CFD-ACE+ NX Nastran LS-DYNA; LS-DYNA CFD FloEFD, FloTherm CFD++ 12

Other Commercial CAE and GPU Progress ISV Domain Location Primary Applications FluiDyna CFD Germany Culises for OpenFOAM; LBultra Vratis CFD Poland Speed-IT for OpenFOAM; ARAEL Prometech CFD Japan Particleworks Turbostream CFD England, UK Turbostream IMPETUS Explicit FEA Sweden AFEA AVL CFD Austria FIRE CoreTech CFD (molding) Taiwan Moldex3D Intes Implicit FEA Germany PERMAS Next Limit CFD Spain XFlow CPFD CFD USA BARRACUDA Flow Science CFD USA FLOW-3D 13

2013: Further Expansion of OF Community ESI acquisition of OpenCFD from SGI during Sep 2012 IDAJ investment in ICON (migration from CD-adapco) This Year 3 Global OpenFOAM User Conferences: APR 24 26, Frankfurt, DE: ESI OpenFOAM Users Conference (first ever) http://www.esi-group.com/corporate/events/2013/openfoam2013 Concentration on OpenFOAM from OpenCFD JUN 11 14, Jeju, KR : 8 th International OpenFOAM Workshop (first in Asia) http://www.openfoamworkshop2013.org/ Concentration on OpenFOAM-extend and Wikki OCT 24 25, Hamburg, DE : 7 th Open Source CFD International Conference (ICON) http://www.opensourcecfd.com/conference2013/ Concentration on both OpenFOAM and OpenFOAM-extend 14

NVIDIA Market Strategy for OpenFOAM Provide technical support for commercial GPU solver developments FluiDyna Culises library with NVIDIA collaboration on AMG Vratis Speed-IT library, development of CUSP-based AMG Invest in alliances (but not development) with key OpenFOAM organizations ESI and OpenCFD Foundation (H. Weller, M. Salari) Wikki and OpenFOAM-extend community (H. Jasak) IDAJ in Japan and ICON in the UK support for both OF and OF-ext Conduct performance studies and customer benchmark evaluations Collaborations: developers, customers, OEMs (Dell, SGI, HP, etc.) 15

OpenFOAM Applied Use: Parameter Optimization #1: Develop validated CFD model in ANSYS Fluent or other commercial CFD software in production #2: Develop CFD model in OpenFOAM, validate against commercial CFD model #3: Conduct parameter sweeps with OpenFOAM model to save on commercial CFD license costs 16

GPU Opportunity for Parameter Optimization Problem Statement: Demand for optimization can existing CPU clusters manage 10x more jobs? Examples: Automotive crashworthiness optimization Jet engine CFD aerodynamics optimization GPU Opportunity: Open source and proprietary not bounded by commercial CAE license costs ISV optimization licensing solved ANSYS, Altair, SIMULIA, etc. hardware problem next GPUs: performance under smaller footprint with better power and cooling efficiency 17

OpenFOAM GPU Focus on Implicit Sparse Solvers OpenFOAM Software Read input, matrix Set-up GPU Implicit Sparse Matrix Operations - Hand-CUDA Parallel - GPU Libraries, CUBLAS 40% - 65% of Profile time, Small % LoC Implicit Sparse Matrix Operations (Investigating OpenACC for more tasks on GPU) CPU - OpenACC Directives Global solution, write output + 18

ANSYS Fluent Preview for 2 x CPU + 2 x Tesla K20X ANSYS Fluent 15.0 Preview Performance Results by NVIDIA, Feb 2013 1.5 2 x K20X E5_2680(16) Lower is Better 2 x E5_2680 SB CPUs, 16 cores total, only 2 cores used with GPUs 1 2.1x Solver settings: 0.5 1.7x CPU Fluent solver: F-cycle, agg8, DILU, 0pre, 3post 0 Helix (tet 1173K) Airfoil (hex 784K) GPU nvamg solver: V-cycle, agg8, MC-DILU, 0pre, 3post NOTE: Times for solver only 19

Culises: New CFD Solver Library for OpenFOAM Culises Features: www.fluidyna.de FluiDyna: TU Munich Spin-Off from 2006 Culises provides a linear solver library Culises requires only two edits to control file of OpenFOAM Multi-GPU ready Contact FluiDyna for license details 20

Culises Coupling to OpenFOAM Culises Coupling is User-Transparent: www.fluidyna.de 21

Culises: New CFD Solver Library for OpenFOAM www.fluidyna.de Easy-to-Use #1. Download and license from www.fluidyna.de #2. Install with script provided by FluiDyna #3. Activate Culises and use of GPUs with 2 simple changes to OF config-file 22

OpenFOAM GPU Speedups Based on Application Speedups for a Range of Industrial Cases: www.fluidyna.de 23

FluiDyna GmbH Lichtenbergstraße 8 D-85748 Garching b. München www.fluidyna.com Dr. Bjoern Landmann Accelerating the Numerical Simulation of Heavy-Vehicle Aerodynamics Using GPUs with Culises ISC 2013, Leipzig, June 2013

Slide 25 Culises - A Library for Accelerated CFD on Hybrid GPU-CPU Systems B. Landmann Culises Multi-GPU runs Speedup by adding multiple GPUs: (a) single-socket board Mesh - # CPUs 9M - 1 CPU 18M - 1 CPU 27M - 1 CPU 36M - 1 CPU # GPUs added +1 GPU +2 GPUs +3 GPUs +4 GPUs Speedup linear solver a 3.5 5.7 7.8 10.6 Speedup total simulation 1.45 1.59 1.67 1.74 Theoretical max speedup s max 1.78 1.82 1.85 1.89 (b) dual-socket board Mesh - # CPUs 9M - 2 CPU 18M - 2 CPU 27M - 2 CPU 36M - 2 CPU # GPUs added +1 GPU +2 GPUs +3 GPUs +4 GPUs Speedup linear solver a 2.5 4.2 6.2 6.9 Speedup total simulation 1.36 1.52 1.63 1.67 Theoretical max speedup s max 1.78 1.82 1.85 1.89

イベント名 : GTC Japan 2013 主催 : エヌビディアジャパン 日時 : 2013 年 7 月 30 日 ( 火 )10:00 18:30 会場 : 東京ミッドタウンホール 参加費 : 無料 イベントサイト : http://www.gputechconf.jp

Thank you