2018.06.04 2018.06.04 1 / 62
2018.06.04 2 / 62
Windows, Mac Unix 0444-J 2018.06.04 3 / 62
Part I Unix GUI CUI: Unix, Windows, Mac OS Part II 2018.06.04 4 / 62 0444-J
( : ) 6 4 ( ) 6 5 * 6 19 SX-ACE * 6 22 (for OCTOPUS,VCC) * 6 26 SX-ACE (MPI) * 6 29 SX-ACE (HPF) * 8 23 Gaussian ( ) * http://www.hpc.cmc.osaka-u.ac.jp/lecture event/lecture/ 2018.06.04 5 / 62
Part I: UNIX 2018.06.04 6 / 62
CUI 2018.06.04 7 / 62
GUI CUI, OS GUI CUI OS (Windows, MacOS X, Unix) GUI (Graphical User Interface) CUI (Character User Interface)/ CLI (Command Line Interface) OS Unix CUI/CLI Unix Unix = CUI/CLI 0445-J 2018.06.04 8 / 62
GUI CUI GUI CUI 1 1 2018.06.04 9 / 62
CUI を理解するコツ I GUI は 地図で CUI は 写真である GUI = 地図 CUI = 写真 CUI では 今 どこにいるか が重要 基本的には 自分が歩いて行く (= cd コマンド等で移動). furihata@cmc.osaka-u.ac.jp (大阪大学サイバーメディアセンター スパコンに通じる並列プログラミングの基礎 ) 2018.06.04 0445-J 10 / 62
CUI II CUI = (shell) CUI =! 0445-J 2018.06.04 11 / 62
CUI III CUI Unix CUI ssh 0445-J 2018.06.04 12 / 62
CUI IV CUI Emacs vi Emacs vi ( ) emacs vi 2018.06.04 13 / 62 0445-J
Unix 2018.06.04 14 / 62
Unix : pwd cd.. cd hoge mkdir hoge rmdir hoge hoge hoge hoge mv hoge poko hoge poko or - poko, - 0445-J 2018.06.04 15 / 62
Unix : ls touch hoge rm hoge mv hoge poko or hoge hoge hoge poko - poko, - 0445-J 2018.06.04 16 / 62
Unix : less hoge hoge more, cat grep kore kore 0445-J 2018.06.04 17 / 62
Unix ( ) Emacs emacs hoge hoge ( emacs. C- Ctrl M- Esc ) C-x C-f C-x C-s C-x C-c C-g C-s hoge. emacs C- M-w C-w C-y hoge ( ) ( ) 2018.06.04 0445-J 18 / 62
Unix ( ) vi vi hoge hoge ( vi. ) i Esc ( ) h,j,k,l :wq :q! x, dd 1, 1 ( ) yy p 1. ( ) 2018.06.04 19 / 62
: Emacs + vi = spacemacs: 2014.10 first release. ver.0.200.13. 0445-J 2018.06.04 20 / 62
Unix Unix = CUI CUI CUI CUI = (shell) CUI CUI Emacs vi Unix 0445-J 2018.06.04 21 / 62
Part II: 2018.06.04 22 / 62
Part ( ) GO 2018.06.04 23 / 62 0491-J
2018.06.04 24 / 62
(SIMD): 1 SX-9 (NEC) 1 1 100GFlops. * 2007. (Top500 1 2011.06-12, 2017.11 10 ): 10.6 PFlops, 705,024. PrimeHPC FX10: 23.3 PFlops, 1,572,864. (, Top500 1 2016.06-2017.11 ): 93 PFlops, 10,649,600 : 1.26 P, : 1.3P * PrimeHPC FX10 spec 6Pbyte. ( ) : 12.7 MW, : 15.3 MW, ( 2: 17.8 MW) * 27, 13 MW 3 2018.06.04 25 / 62
: SX-ACE 3 423 TFlops = 0.423 PFlops, 6144. * 2017 6 TOP500 500 (549 TFlops) 8 ( 1 ) 96TB = 0.1 PB. ( ) 700 KW. * 1000 4 5 %. 8% 0491-J 2018.06.04 26 / 62
I (creative commons -attribution, share alike 3.0 by A.I.Graphic) 2018.06.04 27 / 62
II ILLIAC I, II, III ( 1952, 1962, SIMD 1966) Cray-1 ( 1976, 80-160MFLOPS). 80 (SX-5, 41 TFLOPS, 2002: SX-9, 131 TFLOPS, 2009). TOP500 1 2002.06-2004.06.. Japanese Computenik Blue Gene (2004) 32,768, 2007 212,992 (2011) 10.62 PFLOPS 2 (2013) CPU 33.8 PFLOPS. (2016) CPU 93 PFLOPS. 2018.06.04 28 / 62 0491-J
III (143/top 500 2017.11 ) CPU Cray IBM CPU (35) NEC CPU ( 202) CPU ( ) (21), (18), (15) (EU 86). 10 (top500 ). 0491-J 2018.06.04 29 / 62
, 2018.06.04 30 / 62
I Input Data Operation Output Data 2018.06.04 31 / 62
II ( ) or Input Operation Output 1 + 2 + 3 + + 100 = 5050 for i := 1 to 100 do result += i; ( 100 + 1 ) * 100 / 2 2018.06.04 32 / 62 0491-J
III-1 CPU, HDD Input Fast Operation Output 2018.06.04 33 / 62
III-2 10000 1000 CPU clock (MHz) 100 10 1 0.1 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 Year 2018.06.04 34 / 62
IV-1 ( ) - - SIMD (Single Instruction Multiple Data) Input Vector Operation Output 2018.06.04 35 / 62
IV-2 2018.06.04 0491-J 36 / 62
V-1 ( ) Input Parallel Operation Output 2018.06.04 37 / 62
V-2 0491-J 2018.06.04 38 / 62
VI-1 ( ) Input Parallel Operation Output 2018.06.04 39 / 62
VI-2 0491-J 2018.06.04 40 / 62
VII ( ) - - 0491-J 2018.06.04 41 / 62
VIII NEC SX PC GPU SIMD ( ) 0491-J 2018.06.04 42 / 62
2018.06.04 43 / 62
: I a n 1 a 1 (1 + δ) a + (1 a) n δ Speed up Ratio 60 50 40 30 20 a:50%, delta: 0% a:80%, delta: 0% a:90%, delta: 0% a:95%, delta: 0% a:99%, delta: 0% Speed up Ratio 100 80 60 40 a:50%, delta: 0% a:80%, delta: 0% a:90%, delta: 0% a:95%, delta: 0% a:99%, delta: 0% 10 20 0 10 20 30 40 50 60 70 80 90 100 Number of Processors 0 0 100 200 300 400 500 600 700 800 900 1000 Number of Processors 2018.06.04 0491-J 44 / 62
II Speed up Ratio 60 50 40 30 20 a:50%, delta: 50% a:80%, delta: 50% a:90%, delta: 50% a:95%, delta: 50% a:99%, delta: 50% Speed up Ratio 100 80 60 40 a:50%, delta: 50% a:80%, delta: 50% a:90%, delta: 50% a:95%, delta: 50% a:99%, delta: 50% 10 20 0 10 20 30 40 50 60 70 80 90 100 Number of Processors 0 0 100 200 300 400 500 600 700 800 900 1000 Number of Processors Speed up Ratio 60 50 40 30 20 a:50%, delta: 200% a:80%, delta: 200% a:90%, delta: 200% a:95%, delta: 200% a:99%, delta: 200% Speed up Ratio 100 80 60 40 a:50%, delta: 200% a:80%, delta: 200% a:90%, delta: 200% a:95%, delta: 200% a:99%, delta: 200% 10 20 0 10 20 30 40 50 60 70 80 90 100 Number of Processors 0 0 100 200 300 400 500 600 700 800 900 1000 Number of Processors 2018.06.04 0491-J 45 / 62
III. 0491-J 2018.06.04 46 / 62
2018.06.04 47 / 62
MPI (Message Passing Interface), CPU OpenMP (Multi Processing), CPU OpenMP, SIMD/, CUDA. 2018.06.04 48 / 62
I SIMD, ( ) for i:=1 to 10000 do a[i] := 2*i; 1 2018.06.04 49 / 62 0491-J
II-1 CPU / : (thread ) OpenMP OS - Grand Central Dispatch (MacOS X 10.6, FreeBSD), - intel TBB, Google Go, Rust. 2018.06.04 50 / 62 0491-J
II-2 OpenMP Fortran : program hello.!$omp parallel!$omp end parallel. end 0491-J 2018.06.04 51 / 62
III-1 : (Message) (NEC ) MPI (Message Passing Interface) 0491-J 2018.06.04 52 / 62
III-2 MPI 2018.06.04 53 / 62
: SIMD / (1) SIMD / : Julia lang: @simd for i=1:length(x) @simd @inbounds s += x[i] * y[i] end C/C++ on SX-ACE: #pragma vdir nodep #pragma vdir... for(i=1;i<length(x);i++){ s += x[i] * y[i]; } Computation speed 10 9 8 7 GFlops 6 5 4 3 2 1 normal SIMD SIMD (4 core CPU ) 5!! * PC (vaio, 4core), 1000, 100000. 2018.06.04 0491-J 54 / 62
: OpenMP/MPI (1) Parallel : Julia lang: nheads = @parallel (+) for i=1:200000000 Int(rand(Bool)) end C/C++ on SX-ACE: compile option -Pauto Computation speed Gops 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 normal parallel (4 core CPU ) 33!! * PC (vaio, 4core). 2018.06.04 55 / 62
: OpenMP/MPI (2) 2017 20 SX-ACE SIMD OpenMP. : 5, : 5 - - SIMD, OpenMP 2018.06.04 56 / 62
: OpenMP/MPI (3) Parallel : Black-Scholes (by Julia language) using ParallelAccelerator Intel Labs @acc begin end http://julialang.org/blog/2016/03/parallelaccelerator (36 core CPU ) 130!! 2018.06.04 0491-J 57 / 62
MPI (SX-ACE) 1 (1 cpu, 4 core) PC 4 4 MPI 2018.06.04 0491-J 58 / 62
Thank You! Thank You! 0491-J 2018.06.04 59 / 62