1 2013.07.18 1 I 2013 3 I 2013.07.18 1 / 47
A Flat MPI B 1 2 C: 2 I 2013.07.18 2 / 47
I 2013.07.18 3 / 47
#PJM -L "rscgrp=small" π-computer small: 12 large: 84 school: 24 84 16 = 1344 small school small large 2 2 large I 2013.07.18 4 / 47
I 2013.07.18 5 / 47
MPI + OpenMP = MPI MPI OpenMP do-loop OpenMP 3 fork 3 π-computer 1 16 I 2013.07.18 6 / 47
3 I 2013.07.18 7 / 47
プロセッサ 0 プロセッサ 1 プロセッサ 0 コア 00 コア 01 コア 15 プロセッサ 1 コア 00 コア 01 コア 15 プロセッサ N-1 コア 00 コア 01 コア 15 MPI プロセス 0 プロセッサ 2 MPI プロセス 1 プロセッサ N-1 MPI プロセス N-1 Fork & Join Fork & Join Fork & Join OpenMP スレッド 00 OpenMP スレッド 01 OpenMP スレッド 15 プロセッサ 1 OpenMP スレッド 00 OpenMP スレッド 01 OpenMP スレッド 15 プロセッサ N-1 OpenMP スレッド 00 OpenMP スレッド 01 OpenMP スレッド 15 I 2013.07.18 8 / 47
I 2013.07.18 9 / 47
CPU Fortran95 cpu time() (wall clock time) MPI MPI WTIME() OpenMP omp get wtime() Fortran90 system clock() I 2013.07.18 10 / 47
heat5.f90 heat4...f90 system clock() stopwatch m!! heat5.f90! + module stopwatch, to monitor time.! + many calls to stopwatch stt and.. stp.! - data output calls for profile 1d and 2d (commented out.)!! usage (on pi-computer)!! 1) mkdir../data (unless there is already.)!! 2) mpifrtpx -O3 heat5.f90 (copy un to u is slow in default.)!! 3) pjsub heat5.sh I 2013.07.18 11 / 47
heat5.f90 4 -O3 5 u(1:ngrid,jj%stt:jj%end)=un(1:ngrid,jj%stt:jj%end) 4 usage 5 stopwatch copy un to u I 2013.07.18 12 / 47
################################################## job start at Tue Jul 16 21:07:29 JST 2013 ################################################## # myrank= 3 jj%stt & jj%end = 751 1001 # myrank= 0 jj%stt & jj%end = 1 250 # myrank= 2 jj%stt & jj%end = 501 750 # myrank= 1 jj%stt & jj%end = 251 500 //=============<stop watch>===============\\ profile 1d: 0.000 sec main loop: 8.334 sec mpi sendrecv: 0.409 sec jacobi: 4.103 sec copy un to u: 3.799 sec --------------------------------------- Total: 8.386 sec \\=============<stop watch>===============// ################################################## job end at Tue Jul 16 21:07:39 JST 2013 I 2013.07.18 13 / 47
I 2013.07.18 14 / 47
OpenMP heat6.f90!! heat6.f90! + OpenMP (now this is a hybrid parallel code, with MPI.)! - array calc of u(:,:)=un(:,:). see below.! + double do-loops of u(i,j)=un(i,j), for OpenMP.! usage (on pi-computer)! 1) mkdir../data (unless there is already.)! 2) mpifrtpx -Kopenmp heat6.f90! 3) pjsub heat6.sh I 2013.07.18 15 / 47
OpenMP program main!$ use omp_lib!$omp parallel do do j = jj%stt, jj%end do i = 1, NGRID un(i,j)=(u(i-1,j)+u(i+1,j)+u(i,j-1)+u(i,j+1))*0.25_dp+heat_h end do end do!$omp end parallel do I 2013.07.18 16 / 47
OpenMP! u(1:ngrid,jj%stt:jj%end)=un(1:ngrid,jj%stt:jj%end)!$omp parallel do do j = jj%stt, jj%end do i = 1, NGRID u(i,j)=un(i,j) end do end do!$omp end parallel do I 2013.07.18 17 / 47
mpifrtpx -Kopenmp heat6.f90 heat6.sh school pjsub heat6.sh I 2013.07.18 18 / 47
heat6.sh #!/bin/bash #PJM -N "heat6" #PJM -L "rscgrp=small" #PJM -L "node=4" #PJM -L "elapse=02:00" #PJM -j export FLIB_CNTL_BARRIER_ERR=FALSE.. for opn in 1 2 4 8 16 do export OMP_NUM_THREADS=$opn echo "# omp_num_threads = " $opn mpiexec -n 4./a.out done.. I 2013.07.18 19 / 47
x x 2x = 0 x 0 6 x = 2 β β a: 0.293 b: 0.346 c: 0.432 6 gnuplot y = x x y = 2x x set xrange [xmin:xmax] I 2013.07.18 20 / 47
a b Emacs M-x animate 1st name M-x zone M-x zone c Emacs M-x dunnet 7 3 3 7 help get shovel, look shovel, e, e, dig, look, get cpu,... I 2013.07.18 21 / 47
heat6.f90 1 M ( 16) N ( 84) P (= M N) v.s. S 8 gnuplot 9 8 S stopwatch module Total 9 NGRID I 2013.07.18 22 / 47
(a) (b) NGRID, N, M =OMP NUM THREADS (c) gnuplot (d) (e) I 10 gmail kageyama.lecture@gmail.com pdf 130718 130718 120x227x Yamada 2013 7 25 24 10 (e) I 2013.07.18 23 / 47
A Flat MPI A Flat MPI I 2013.07.18 24 / 47
A Flat MPI Flat MPI 1 1 16 MPI OpenMP 15 1 1 16 MPI 4 4 16 = 64 MPI Flat MPI 11 11 OpenMP I 2013.07.18 25 / 47
A Flat MPI FLAT MPI プロセッサ 0 コア 00 コア 01 コア 15 プロセッサ 1 MPI プロセス 0 MPI プロセス 1 MPI プロセス 15 プロセッサ 1 コア 00 コア 01 コア 15 プロセッサ N-1 コア 00 コア 01 コア 15 プロセッサ 2 MPI プロセス 16 MPI プロセス 17 MPI プロセス 31 プロセッサ N-1 MPI プロセス 16*N-16 MPI プロセス 16*N-15 MPI プロセス 16*N-1 I 2013.07.18 26 / 47
B 1 2 B 1 2 I 2013.07.18 27 / 47
C: 2 C: 2 I 2013.07.18 28 / 47
C: 2 1 1 : 16 I 2013.07.18 29 / 47
C: 2 2 16 1 2 I 2013.07.18 30 / 47
C: 2 1 NGRID 61 100 1 2 3721 I 2013.07.18 31 / 47
C: 2 1 2 NGRID 1 2 I 2013.07.18 32 / 47
C: 2 1 MPI I 2013.07.18 33 / 47
C: 2 2 1. 2. MPI 2 I 2013.07.18 34 / 47
C: 2 2 400 I 2013.07.18 35 / 47
C: 2 2 4 MPI 1 46 I 2013.07.18 36 / 47
C: 2 2 2 4 MPI 38 I 2013.07.18 37 / 47
C: 2 2 I 2013.07.18 38 / 47
C: 2 2 I 2013.07.18 39 / 47
C: 2 2 I 2013.07.18 40 / 47
C: 2 2 MPI I 2013.07.18 41 / 47
C: 2 2 4 1,3,5,7 I 2013.07.18 42 / 47
C: 2 2 4 0,2,6,8 I 2013.07.18 43 / 47
C: 2 MPI CART CREATE MPI MPI CART CREATE 12 12 I 2013.07.18 44 / 47
C: 2 π-computer I 2013.07.18 45 / 47
C: 2 MPI I 2013.07.18 46 / 47
C: 2 1 2 I 2013.07.18 47 / 47