スパコンに通じる並列プログラミングの基礎

Similar documents
スパコンに通じる並列プログラミングの基礎

スパコンに通じる並列プログラミングの基礎

活用ガイド (ハードウェア編)

活用ガイド (ソフトウェア編)

tebiki00.dvi

困ったときのQ&A

Windows Cygwin Mac *1 Emacs Ruby ( ) 1 Cygwin Bash Cygwin Windows Cygwin Cygwin Mac 1 Mac 1.2 *2 ls *3 *1 OS Linux *2 *3 Enter ( ) 2

untitled

活用ガイド (ソフトウェア編)

II

untitled

i

2

main.dvi

困ったときのQ&A

パソコン機能ガイド

パソコン機能ガイド

エクセルカバー入稿用.indd

01_.g.r..

入門ガイド


<4D F736F F F696E74202D C835B B E B8CDD8AB B83685D>

SC-85X2取説



bash on Ubuntu on Windows bash on Ubuntu on Windows bash on Ubuntu on Windows bash on Ubuntu on Windows bash on Ubuntu on Windows ˆ Windows10 64bit Wi

ii

untitled

Step2 入門

はしがき・目次・事例目次・凡例.indd


これわかWord2010_第1部_ indd

パワポカバー入稿用.indd

これでわかるAccess2010

Javaと.NET

GNU Emacs GNU Emacs

平成18年版 男女共同参画白書


OpenMP¤òÍѤ¤¤¿ÊÂÎó·×»»¡Ê£±¡Ë

ii


i


Wide Scanner TWAIN Source ユーザーズガイド

01_OpenMP_osx.indd


™…


09中西



MultiPASS B-20 MultiPASS Suite 3.10使用説明書




リファレンス

OpenMP¤òÍѤ¤¤¿ÊÂÎó·×»»¡Ê£±¡Ë

結婚生活を強める

openmp1_Yaguchi_version_170530

273? C

LAN Copyright c Daikoku Manabu This tutorial is licensed under a Creative Commons Attribution 2.1 Japan License

X Window System X X &

Slides: TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments


2 Windows 10 *1 3 Linux 3.1 Windows Bash on Ubuntu on Windows cygwin MacOS Linux OS Ubuntu OS Linux OS 1 GUI Windows Explorer Mac Finder 1 GUI

インターネット入門

(報告書まとめ 2004/03/  )


ストリーミング SIMD 拡張命令2 (SSE2) を使用した SAXPY/DAXPY

178 5 I 1 ( ) ( ) ( ) ( ) (1) ( 2 )

I II III 28 29

生活設計レジメ

44 4 I (1) ( ) (10 15 ) ( 17 ) ( 3 1 ) (2)


UNIX


長崎県地域防災計画

ONLINE_MANUAL

ONLINE_MANUAL

I

Transcription:

2018.09.10 furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 1 / 59

furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 2 / 59

Windows, Mac Unix 0444-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 3 / 59

Part I Unix GUI CUI: Unix, Windows, Mac OS Part II 0444-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 4 / 59

( : ) 9 10 ( ) 9 13 * 9 19 SX-ACE * 9 20 (for OCTOPUS,VCC) * 9 26 SX-ACE (MPI) * 10 10 AVS 10 11 AVS / * http://www.hpc.cmc.osaka-u.ac.jp/lecture event/lecture/ 0444-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 5 / 59

Part I: UNIX furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 6 / 59

CUI furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 7 / 59

GUI CUI, OS GUI OS (Windows, MacOS X, Unix) GUI (Graphical User Interface) CUI CUI (Character User Interface)/ CLI (Command Line Interface) OS Unix CUI/CLI Unix Unix = CUI/CLI 0445-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 8 / 59

GUI CUI GUI CUI 1 1 0445-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 9 / 59

CUI を理解するコツ I GUI は 地図で CUI は 写真である GUI = 地図 CUI = 写真 CUI では 今 どこにいるか が重要 基本的には 自分が歩いて行く (= cd コマンド等で移動). furihata@cmc.osaka-u.ac.jp (大阪大学サイバーメディアセンター スパコンに通じる並列プログラミングの基礎 ) 0445-J 2018.09.10 10 / 59

CUI II CUI = (shell) CUI =! furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 11 / 59 0445-J

CUI III CUI Unix CUI ssh 0445-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 12 / 59

CUI IV CUI Emacs vi Emacs vi ( ) emacs vi 0445-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 13 / 59

Unix furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 14 / 59

Unix : pwd cd.. cd hoge mkdir hoge rmdir hoge mv hoge poko hoge hoge hoge hoge poko or - poko, - 0445-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 15 / 59

Unix : ls touch hoge rm hoge mv hoge poko hoge hoge hoge poko or - poko, - 0445-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 16 / 59

Unix : less hoge hoge more, cat grep kore kore 0445-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 17 / 59

Unix ( ) Emacs emacs hoge hoge ( emacs. C- Ctrl M- Esc ) C-x C-f C-x C-s C-x C-c C-g C-s hoge. emacs C- M-w C-w C-y hoge ( ) ( ) furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 18 / 59 0445-J

Unix ( ) vi vi hoge hoge ( vi. ) i Esc ( ) h,j,k,l :wq :q! x, dd 1, 1 ( ) yy p 1. ( ) 0445-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 19 / 59

: Emacs + vi = spacemacs: 2014.10 first release. ver.0.200.13. 0445-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 20 / 59

Unix Unix = CUI CUI CUI CUI = (shell) CUI CUI Emacs vi Unix 0445-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 21 / 59

Part II: furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 22 / 59

Part ( ) GO 0491-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 23 / 59

furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 24 / 59

(SIMD): 1 SX-9 (NEC) 1 1 100GFlops. * 2007. (Top500 2018.06 ) Summit (IBM Power9 chip, Top500 1 ): 122 PFlops, 2,282,544. (, Top500 2 ): 93 PFlops, 10,649,600 ABCI (, Top500 5 ): 19.9 PFlops, 391,680 ( Top500 16 ): 10.5 PFlops, 705,024. Summit: 2.8P, : 1.3P ABCI: 420T, : 1.26 P, ( ) Summit: 8.8MW, : 15.3 MW ABCI: 1.6MW, : 12.7 MW * 27 13 MW 3 0491-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 25 / 59

: SX-ACE 3 423 TFlops = 0.423 PFlops, 6144. * 2018 6 TOP500 500 (715 TFlops) 6 ( 1 ) 96TB = 0.1 PB. ( ) 700 KW. * 1000 Summit 0.35%, 3.5%, 8%, 0491-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 26 / 59

I (creative commons -attribution, share alike 3.0 by A.I.Graphic) furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 27 / 59

II ILLIAC I, II, III ( 1952, 1962, SIMD 1966) Cray-1 ( 1976, 80-160MFLOPS). 80 (SX-5, 41 TFLOPS, 2002: SX-9, 131 TFLOPS, 2009). TOP500 1 2002.06-2004.06.. Japanese Computenik Blue Gene (2004) 32,768, 2007 212,992 (2011) 10.62 PFLOPS 2 (2013) CPU 33.8 PFLOPS. (2016) CPU 93 PFLOPS. Summit(2018) CPU 122 PFLOPS. 0491-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 28 / 59

III (124/top 500 2017.11 ) CPU Cray IBM CPU (36) NEC CPU ( 206) CPU ( ) (22), (21), (18), 10 (top500 ). 0491-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 29 / 59

, furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 30 / 59

I Input Data Operation Output Data furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 31 / 59

II ( ) or Input Operation Output 1 + 2 + 3 + + 100 = 5050 for i := 1 to 100 do result += i; ( 100 + 1 ) * 100 / 2 0491-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 32 / 59

III-1 CPU, HDD Input Fast Operation Output furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 33 / 59

III-2 10000 1000 CPU clock (MHz) 100 10 1 0.1 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 Year furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 0491-J 34 / 59

IV-1 ( ) - - SIMD (Single Instruction Multiple Data) Input Vector Operation Output furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 35 / 59

IV-2 0491-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 36 / 59

V-1 ( ) Input Parallel Operation Output furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 37 / 59

V-2 0491-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 38 / 59

VI-1 ( ) Input Parallel Operation Output furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 39 / 59

VI-2 0491-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 40 / 59

VII ( ) - - 0491-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 41 / 59

VIII NEC SX PC GPU SIMD ( ) 0491-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 42 / 59

furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 43 / 59

: I a n 1 a 1 (1 + δ) a + (1 a) n δ Speed up Ratio 60 50 40 30 20 a:50%, delta: 0% a:80%, delta: 0% a:90%, delta: 0% a:95%, delta: 0% a:99%, delta: 0% Speed up Ratio 100 80 60 40 a:50%, delta: 0% a:80%, delta: 0% a:90%, delta: 0% a:95%, delta: 0% a:99%, delta: 0% 10 20 0 10 20 30 40 50 60 70 80 90 100 Number of Processors 0 0 100 200 300 400 500 600 700 800 900 1000 Number of Processors furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 44 / 59 0491-J

II Speed up Ratio 60 50 40 30 20 a:50%, delta: 50% a:80%, delta: 50% a:90%, delta: 50% a:95%, delta: 50% a:99%, delta: 50% Speed up Ratio 100 80 60 40 a:50%, delta: 50% a:80%, delta: 50% a:90%, delta: 50% a:95%, delta: 50% a:99%, delta: 50% 10 20 0 10 20 30 40 50 60 70 80 90 100 Number of Processors 0 0 100 200 300 400 500 600 700 800 900 1000 Number of Processors Speed up Ratio 60 50 40 30 20 a:50%, delta: 200% a:80%, delta: 200% a:90%, delta: 200% a:95%, delta: 200% a:99%, delta: 200% Speed up Ratio 100 80 60 40 a:50%, delta: 200% a:80%, delta: 200% a:90%, delta: 200% a:95%, delta: 200% a:99%, delta: 200% 10 20 0 10 20 30 40 50 60 70 80 90 100 Number of Processors 0 0 100 200 300 400 500 600 700 800 900 1000 Number of Processors furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 0491-J 45 / 59

III. 0491-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 46 / 59

furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 47 / 59

MPI (Message Passing Interface), CPU OpenMP (Multi Processing), CPU OpenMP, SIMD/, CUDA. 0491-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 48 / 59

I SIMD, ( ) for i:=1 to 10000 do a[i] := 2*i; 1 0491-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 49 / 59

II-1 CPU / : (thread ) OpenMP OS - Grand Central Dispatch (MacOS X 10.6, FreeBSD), - intel TBB, Google Go, Rust. 0491-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 50 / 59

II-2 OpenMP Fortran : program hello.!$omp parallel!$omp end parallel. end furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 51 / 59 0491-J

III-1 : (Message) (NEC ) MPI (Message Passing Interface) 0491-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 52 / 59

III-2 MPI furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 0491-J 53 / 59

: SIMD / (1) SIMD / : Julia lang: @simd for i=1:length(x) @simd @inbounds s += x[i] * y[i] end C/C++ on SX-ACE: #pragma vdir nodep #pragma vdir... for(i=1;i<length(x);i++){ s += x[i] * y[i]; } Computation speed 10 9 8 7 GFlops 6 5 4 3 2 1 normal SIMD SIMD (4 core CPU ) 5!! * PC (vaio, 4core), 1000, 100000. 0491-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 54 / 59

: OpenMP/MPI (1) Parallel : Julia lang: nheads = @parallel (+) for i=1:200000000 Int(rand(Bool)) end C/C++ on SX-ACE: compile option -Pauto Computation speed Gops 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 normal (4 core CPU ) 33!! * PC (vaio, 4core). furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 0491-J 55 / 59 parallel

: OpenMP/MPI (2) 2017 20 SX-ACE SIMD OpenMP. : 5, : 5 - - SIMD, OpenMP furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 56 / 59 0491-J

: OpenMP/MPI (3) Parallel : Black-Scholes (by Julia language) using ParallelAccelerator Intel Labs @acc begin end http://julialang.org/blog/2016/03/parallelaccelerator (36 core CPU ) 130!! 0491-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 57 / 59

MPI (SX-ACE) 1 (1 cpu, 4 core) PC 4 4 MPI 0491-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 58 / 59

Thank You! Thank You! 0491-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 59 / 59