Microsoft PowerPoint - omp-03.ppt [互換モード]

Size: px
Start display at page:

Download "Microsoft PowerPoint - omp-03.ppt [互換モード]"

Transcription

1 Parallel Programming for Multicore Processors using OpenMP Part III: Parallel Version + Exercise Kengo Nakajima Information Technology enter Programming for Parallel omputing ( ) Seminar on Advanced omputing ( )

2 OMP-3 1 Parallel Version: OpenMP OpenMP version of -sol Number of threads= PEsmpTOT can be controlled in the program Fundamental Idea Meshes in a same color/level are independent, therefore parallel/concurrent processing is possible for these meshes.

3 OMP olors, 4-Threads Initial Mesh

4 OMP olors, 4-Threads Initial Mesh

5 OMP olors, 4-Threads Renumbering according to olor ID

6 OMP olors, 4-Threads Meshes in a same color/level are independent, therefore parallel/concurrent processing is possible for these meshes, renumbered meshes are assigned to thread #3 thread #2 thread #1 thread #0 threads

7 OMP-3 6 Files on FX10 >$ cd <$O-TOP> >$ cp /home/ss/aics60//multicore-c.tar. >$ cp /home/ss/aics60/f/multicore-f.tar. >$ tar xvf multicore-c.tar >$ tar xvf multicore-f.tar >$ cd multicore onfirm the following directories: L3 omp <$O-L3>, <$O-stream>

8 OMP-3 7 Files on FX10 (cont.) Location <$O-L3>/src,<$O-L3>/run ompile/run Main Part cd <$O-L3>/src make <$O-L3>/run/L3-sol (exec) ontrol Data <$O-L3>/run/INPUT.DAT Batch Job Script <$O-L3>/run/go1.sh

9 OMP-3 8 Running the ode % cd <$O-L3> % ls run src src0 reorder0 % cd src % make % cd../run % ls L3-sol L3-sol % <modify INPUT.DAT > % <modify go1.sh > % pjsub go1.sh

10 OMP-3 9 Running the Program L3-sol Poisson Solver FVM test.inp ParaView File INPUT.DAT ontrol File

11 OMP-3 10 ontrol Data: INPUT.DAT NX/NY/NZ 1.00e e e-00 DX/DY/DZ 1.0e-08 EPSIG 16 PEsmpTOT 100 NOLORtot NX,NY,NZ Number of meshes in X/Y/Z dir. DX,DY,DZ Size of meshes EPSIG onvergence riteria for IG PEsmpTOT Thread Number NOLORtot Reordering Method + Initial Number of olors/levels 2: M, =0: M, =-1: RM, -2 : MRM z NZ y x NX NY Z X Y

12 OMP-3 11 go1.sh #!/bin/sh #PJM -L "node=1" #PJM -L "elapse=00:10:00" #PJM -L "rscgrp=lecture" #PJM -g "gt71" #PJM -j #PJM -o "arcm.lst" export OMP_NUM_THREADS=16./L3-sol =PEsmpTOT

13 OMP-3 12 Applying OpenMP to -sol Examples Optimization + Exercise

14 OMP-3 13 Applying OpenMP to -sol on IG solver Dot Products, DAXPY, Mat-Vec NO data dependency: Just insert directives Preconditioning (I Factorization, Forward/Backward Substitution) NO data dependency in same color: Parallel processing is possible for meshes in same color

15 OMP-3 14 Just inserting directives works fine, but... (1/2) (Mat-Vec)!$omp parallel do private(i,val,k) do i = 1, N VAL= D(i)*W(i,P) do k= indexl(i-1)+1, indexl(i) VAL= VAL + AL(k)*W(itemL(k),P) do k= indexu(i-1)+1, indexu(i) VAL= VAL + AU(k)*W(itemU(k),P) W(i,Q)= VAL!$omp end parallel do Thread number cannot be handled in the program

16 OMP-3 15 Just inserting directives works fine, but... (2/2) (Forward Substitution) do icol= 1, NOLORtot!$omp parallel do private (i, VAL, k) do i= OLORindex(icol-1)+1, OLORindex(icol) VAL= D(i) do k= indexl(i-1)+1, indexl(i) VAL= VAL - (AL(k)**2) * DD(itemL(k)) DD(i)= 1.d0/VAL!$omp end parallel do Thread number cannot be handled in the program

17 OMP-3 16 Parallelize IG Method by OpenMP Dot Product: OK DAXPY: OK Matrix-Vector Multiply: OK Preconditioning

18 OMP-3 17 Main Program (1/2) program MAIN use STRUT use PG use solver_ig_mc implicit REAL*8 (A-H,O-Z) real(kind=8), dimension(:), allocatable :: WK call INPUT call POINTER_INIT call BOUNDARY_ELL call ELL_METRIS call POI_GEN PHI= 0.d0 call solve_ig_mc & & ( IELTOT, NPL, NPU, indexl, iteml, indexu, itemu, D, & & BFORE, PHI, AL, AU, NOLORtot, PEsmpTOT, & & SMPindex, SMPindexG, EPSIG, ITR, IER)

19 OMP-3 18 Main Program (2/2) allocate (WK(IELTOT)) do ic0= 1, IELTOT icel= NEWtoOLD(ic0) WK(icel)= PHI(ic0) Renumbering of PHI to original numbering do icel= 1, IELTOT PHI(icel)= WK(icel) call OUTUD stop end

20 OMP-3 19 Main Program program MAIN use STRUT use PG use solver_ig_mc implicit REAL*8 (A-H,O-Z) real(kind=8), dimension(:), allocatable :: WK call INPUT call POINTER_INIT call BOUNDARY_ELL call ELL_METRIS call POI_GEN PHI= 0.d0 call solve_ig_mc & & ( IELTOT, NPL, NPU, indexl, iteml, indexu, itemu, D, & & BFORE, PHI, AL, AU, NOLORtot, PEsmpTOT, & & SMPindex, SMPindexG, EPSIG, ITR, IER)

21 OMP-3 20 module STRUT module STRUT use omp_lib include 'precision.inc'!!-- METRIs & FLUX integer (kind=kint) :: IELTOT, IELTOTp, N integer (kind=kint) :: NX, NY, NZ, NXP1, NYP1, NZP1, IBNODTOT integer (kind=kint) :: NXc, NYc, NZc real (kind=kreal) :: & & DX, DY, DZ, XAREA, YAREA, ZAREA, RDX, RDY, RDZ, & & RDX2, RDY2, RDZ2, R2DX, R2DY, R2DZ real (kind=kreal), dimension(:), allocatable :: & & VOLEL, VOLNOD, RV, RVN integer (kind=kint), dimension(:,:), allocatable :: & & XYZ, NEIBcell!!-- BOUNDARYs integer (kind=kint) :: ZmaxELtot integer (kind=kint), dimension(:), allocatable :: B_INDEX, B_NOD integer (kind=kint), dimension(:), allocatable :: ZmaxEL!!-- WORK integer (kind=kint), dimension(:,:), allocatable :: IWKX real(kind=kreal), dimension(:,:), allocatable :: FV integer (kind=kint) :: PEsmpTOT end module STRUT IELTOT: Number of meshes (NX x NY x NZ) N: Number of modes NX,NY,NZ: Number of meshes in x/y/z directions NXP1,NYP1,NZP1: Number of nodes in x/y/z directions IBNODTOT: = NXP1 x NYP1 XYZ(IELTOT,3): Location of meshes NEIBcell(IELTOT,6): Neighboring meshes PEsmpTOT: Number of threads

22 OMP-3 21 module PG module PG (cont.) integer, parameter :: N2= 256 integer :: NUmax, NLmax, NOLORtot, NOLORk, NU, NL integer :: NPL, NPU integer :: METHOD, ORDER_METHOD real(kind=8) :: EPSIG real(kind=8), dimension(:), allocatable :: D, PHI, BFORE real(kind=8), dimension(:), allocatable :: AL, AU integer, dimension(:), allocatable :: INL, INU, OLORindex integer, dimension(:), allocatable :: SMPindex, SMPindexG integer, dimension(:), allocatable :: OLDtoNEW, NEWtoOLD integer, dimension(:,:), allocatable :: IAL, IAU integer, dimension(:), allocatable :: indexl, iteml integer, dimension(:), allocatable :: indexu, itemu end module PG NOLORtot OLORindex (0:NOLORtot) Total number of colors/levels Index of number of meshes in each color/level (OLORindex(icol)- OLORindex(icol-1)) SMPindex (0:NOLORtot*PEsmpTOT) SMPindexG(0:PEsmpTOT) OLDtoNEW, NEWtoOLD Reference table before/after renumbering

23 OMP-3 22 Variables/Arrays for Matrix (1/2) Name Type ontent D(N) R Diagonal components of the matrix (N= IELTOT) BFORE(N) R RHS vector PHI(N) R Unknown vector indexl(0:n), I # of L/U non-zero off-diag. comp. (RS) indexu(0:n) NPL, NPU I Total # of L/U non-zero off-diag. comp. (RS) iteml(npl), itemu(npu) AL(NPL), AU(NPU) I R olumn ID of L/U non-zero off-diag. comp. (RS) L/U non-zero off-diag. comp. (RS) Name Type ontent NL,NU I MAX. # of L/U non-zero off-diag. comp. for each mesh (=6) INL(N), INU(N) IAL(NL,N), IAU(NU,N) I I # of L/U non-zero off-diag. comp. olumn ID of L/U non-zero off-diag. comp.

24 OMP-3 23 Variables/Arrays for Matrix (2/2) Name Type ontent NOLORtot I Input: reordering method + initial number of colors/levels 2: M, =0: M, =-1: RM, -2 : MRM Output: Final number of colors/levels OLORindex (0:NOLORtot) I Number of meshes at each color/level 1D compressed array Meshes in icol th color/level are stored in this array from OLORindex(icol-1)+1 to OLORindex(icol) NEWtoOLD(N) I Reference array from New to Old numbering OLDtoNEW(N) I Reference array from Old to New numbering PEsmpTOT I Number of Threads SMPindex (0:NOLORtot*PEsmpTOT) I Array for OpenMP Operations (for Loops with Data Dependency) SMPindexG(0:PEsmpTOT) I Array for OpenMP Operations (for Loops without Data Dependency)

25 OMP-3 24 Main Program program MAIN use STRUT use PG use solver_ig_mc implicit REAL*8 (A-H,O-Z) real(kind=8), dimension(:), allocatable :: WK call INPUT call POINTER_INIT call BOUNDARY_ELL call ELL_METRIS call POI_GEN PHI= 0.d0 call solve_ig_mc & & ( IELTOT, NPL, NPU, indexl, iteml, indexu, itemu, D, & & BFORE, PHI, AL, AU, NOLORtot, PEsmpTOT, & & SMPindex, SMPindexG, EPSIG, ITR, IER)

26 OMP-3 25 input: reading INPUT.DAT!!***!*** INPUT!***!! INPUT ONTROL DATA! subroutine INPUT use STRUT use PG implicit REAL*8 (A-H,O-Z) character*80 NTFIL!!-- NTL. file open (11, file='input.dat', status='unknown') read (11,*) NX, NY, NZ read (11,*) DX, DY, DZ read (11,*) EPSIG read (11,*) PEsmpTOT read (11,*) NOLORtot close (11)!=== return end NX/NY/NZ 1.00e e e-02 DX/DY/DZ 1.00e-08 EPSIG 16 PEsmpTOT 100 NOLORtot PEsmpTOT Thread Number NOLORtot Reordering Method + Initial Number of olors/levels 2: M =0: M =-1: RM -2 : MRM

27 OMP-3 26 cell_metrics!!***!*** ELL_METRIS!***! subroutine ELL_METRIS use STRUT use PG implicit REAL*8 (A-H,O-Z)!!-- ALLOATE allocate (VOLEL(IELTOT)) allocate ( RV(IELTOT))!!-- VOLUME, AREA, PROJETION etc. XAREA= DY * DZ YAREA= DX * DZ ZAREA= DX * DY DZ XAREA RDX= 1.d0 / DX RDY= 1.d0 / DY RDZ= 1.d0 / DZ RDX2= 1.d0 / (DX**2) RDY2= 1.d0 / (DY**2) RDZ2= 1.d0 / (DZ**2) R2DX= 1.d0 / (0.50d0*DX) R2DY= 1.d0 / (0.50d0*DY) R2DZ= 1.d0 / (0.50d0*DZ) z y x DX DY V0= DX * DY * DZ RV0= 1.d0/V0 VOLEL= V0 RV = RV0 return end

28 OMP-3 27 Main Program program MAIN use STRUT use PG use solver_ig_mc implicit REAL*8 (A-H,O-Z) real(kind=8), dimension(:), allocatable :: WK call INPUT call POINTER_INIT call BOUNDARY_ELL call ELL_METRIS call POI_GEN PHI= 0.d0 call solve_ig_mc & & ( IELTOT, NPL, NPU, indexl, iteml, indexu, itemu, D, & & BFORE, PHI, AL, AU, NOLORtot, PEsmpTOT, & & SMPindex, SMPindexG, EPSIG, ITR, IER)

29 OMP-3 28 poi_gen (1/9) subroutine POI_GEN use STRUT use PG implicit REAL*8 (A-H,O-Z)!!-- INIT. nn = IELTOT nnp= IELTOTp NU= 6 NL= 6 allocate (BFORE(nn), D(nn), PHI(nn)) allocate (INL(nn), INU(nn), IAL(NL,nn), IAU(NU,nn)) PHI = 0.d0 D = 0.d0 BFORE= 0.d0 INL= 0 INU= 0 IAL= 0 IAU= 0

30 OMP-3 29!! ! ONNETIVITY! poi_gen (2/9)!=== do icel= 1, IELTOT icn1= NEIBcell(icel,1) icn2= NEIBcell(icel,2) icn3= NEIBcell(icel,3) icn4= NEIBcell(icel,4) icn5= NEIBcell(icel,5) icn6= NEIBcell(icel,6) NEIBcell(icel,6)!=== if (icn5.ne.0.and.icn5.le.ieltot) then icou= INL(icel) + 1 IAL(icou,icel)= icn5 INL( icel)= icou if (icn3.ne.0.and.icn3.le.ieltot) then icou= INL(icel) + 1 IAL(icou,icel)= icn3 INL( icel)= icou if (icn1.ne.0.and.icn1.le.ieltot) then icou= INL(icel) + 1 IAL(icou,icel)= icn1 INL( icel)= icou if (icn2.ne.0.and.icn2.le.ieltot) then icou= INU(icel) + 1 IAU(icou,icel)= icn2 INU( icel)= icou if (icn4.ne.0.and.icn4.le.ieltot) then icou= INU(icel) + 1 IAU(icou,icel)= icn4 INU( icel)= icou if (icn6.ne.0.and.icn6.le.ieltot) then icou= INU(icel) + 1 IAU(icou,icel)= icn6 INU( icel)= icou NEIBcell(icel,1) NEIBcell(icel,3) NEIBcell(icel,5) NEIBcell(icel,4) Lower Triangular Part NEIBcell(icel,5)= icel NX*NY NEIBcell(icel,3)= icel NX NEIBcell(icel,1)= icel 1 NEIBcell(icel,2)

31 OMP-3 30!! ! ONNETIVITY! poi_gen (2/9)!=== do icel= 1, IELTOT icn1= NEIBcell(icel,1) icn2= NEIBcell(icel,2) icn3= NEIBcell(icel,3) icn4= NEIBcell(icel,4) icn5= NEIBcell(icel,5) icn6= NEIBcell(icel,6) NEIBcell(icel,6)!=== if (icn5.ne.0.and.icn5.le.ieltot) then icou= INL(icel) + 1 IAL(icou,icel)= icn5 INL( icel)= icou if (icn3.ne.0.and.icn3.le.ieltot) then icou= INL(icel) + 1 IAL(icou,icel)= icn3 INL( icel)= icou if (icn1.ne.0.and.icn1.le.ieltot) then icou= INL(icel) + 1 IAL(icou,icel)= icn1 INL( icel)= icou if (icn2.ne.0.and.icn2.le.ieltot) then icou= INU(icel) + 1 IAU(icou,icel)= icn2 INU( icel)= icou if (icn4.ne.0.and.icn4.le.ieltot) then icou= INU(icel) + 1 IAU(icou,icel)= icn4 INU( icel)= icou if (icn6.ne.0.and.icn6.le.ieltot) then icou= INU(icel) + 1 IAU(icou,icel)= icn6 INU( icel)= icou NEIBcell(icel,1) NEIBcell(icel,3) NEIBcell(icel,5) NEIBcell(icel,4) Upper Triangular Part NEIBcell(icel,2)= icel + 1 NEIBcell(icel,4)= icel + NX NEIBcell(icel,6)= icel + NX*NY NEIBcell(icel,2)

32 OMP-3 31 poi_gen (3/9)!! ! MULTIOLORING! !=== allocate (OLDtoNEW(IELTOT), NEWtoOLD(IELTOT)) allocate (OLORindex(0:IELTOT)) 111 continue write (*,'(//a,i8,a)') 'You have', IELTOT, ' elements.' write (*,'( a )') 'How many colors do you need?' write (*,'( a )') ' #OLOR must be more than 2 and' write (*,'( a,i8 )') ' #OLOR must not be more than', IELTOT write (*,'( a )') ' M if #OLOR.eq. 0' write (*,'( a )') ' RM if #OLOR.eq.-1' write (*,'( a )') 'MRM if #OLOR.le.-2' write (*, ( a ) ) => Reordering NOLORtot > 1: Multicolor NOLORtot = 0: M NOLORtot =-1: RM NOLORtot <-1: M-RM if (NOLORtot.gt.0) then call M (IELTOT, NL, NU, INL, IAL, INU, IAU, & & NOLORtot, OLORindex, NEWtoOLD, OLDtoNEW) if (NOLORtot.eq.0) then call M (IELTOT, NL, NU, INL, IAL, INU, IAU, & & NOLORtot, OLORindex, NEWtoOLD, OLDtoNEW) if (NOLORtot.eq.-1) then call RM (IELTOT, NL, NU, INL, IAL, INU, IAU, & & NOLORtot, OLORindex, NEWtoOLD, OLDtoNEW) if (NOLORtot.lt.-1) then call MRM (IELTOT, NL, NU, INL, IAL, INU, IAU, & & NOLORtot, OLORindex, NEWtoOLD, OLDtoNEW) write (*,'(//a,i8,// )') '### FINAL OLOR NUMBER', NOLORtot

33 OMP-3 32 poi_gen (4/9) allocate (SMPindex(0:PEsmpTOT*NOLORtot)) SMPindex= 0 do ic= 1, NOLORtot nn1= OLORindex(ic) - OLORindex(ic-1) num= nn1 / PEsmpTOT nr = nn1 - PEsmpTOT*num do ip= 1, PEsmpTOT if (ip.le.nr) then SMPindex((ic-1)*PEsmpTOT+ip)= num + 1 else SMPindex((ic-1)*PEsmpTOT+ip)= num do ic= 1, NOLORtot do ip= 1, PEsmpTOT j1= (ic-1)*pesmptot + ip j0= j1-1 SMPindex(j1)= SMPindex(j0) + SMPindex(j1) allocate (SMPindexG(0:PEsmpTOT)) SMPindexG= 0 nn= IELTOT / PEsmpTOT nr= IELTOT - nn*pesmptot do ip= 1, PEsmpTOT SMPindexG(ip)= nn if (ip.le.nr) SMPindexG(ip)= nn + 1 do ip= 1, PEsmpTOT SMPindexG(ip)= SMPindexG(ip-1) + SMPindexG(ip) SMPindex: for preconditioning do ic= 1, NOLORtot!$omp parallel do do ip= 1, PEsmpTOT ip1= (ic-1)*pesmptot+ip do i= SMPindex(ip1-1)+1, SMPindex(ip1) ( )!omp end parallel do!===

34 OMP-3 33 SMPindex: for preconditioning do ic= 1, NOLORtot!$omp parallel do do ip= 1, PEsmpTOT ip1= (ic-1)*pesmptot+ip do i= SMPindex(ip1-1)+1, SMPindex(ip1) ( )!omp end parallel do Initial Vector oloring (5 colors) +Ordering color=1 color=2 color=3 color=4 color=5 color=1 color=2 color=3 color=4 color= colors, 8-threads Meshes in same color are independent: parallel processing Reordering in ascending order according to color ID

35 OMP-3 34 poi_gen (4/9) allocate (SMPindex(0:PEsmpTOT*NOLORtot)) SMPindex= 0 do ic= 1, NOLORtot nn1= OLORindex(ic) - OLORindex(ic-1) num= nn1 / PEsmpTOT nr = nn1 - PEsmpTOT*num do ip= 1, PEsmpTOT if (ip.le.nr) then SMPindex((ic-1)*PEsmpTOT+ip)= num + 1 else SMPindex((ic-1)*PEsmpTOT+ip)= num do ic= 1, NOLORtot do ip= 1, PEsmpTOT j1= (ic-1)*pesmptot + ip j0= j1-1 SMPindex(j1)= SMPindex(j0) + SMPindex(j1) allocate (SMPindexG(0:PEsmpTOT)) SMPindexG= 0 nn= IELTOT / PEsmpTOT nr= IELTOT - nn*pesmptot do ip= 1, PEsmpTOT SMPindexG(ip)= nn if (ip.le.nr) SMPindexG(ip)= nn + 1!$omp parallel do do ip= 1, PEsmpTOT do i= SMPindexG(ip-1)+1, SMPindexG(ip) ( )!$omp end parallel do SMPindexG: for Dot-products, DAXPY, Mat-vec, and Poi-gen do ip= 1, PEsmpTOT SMPindexG(ip)= SMPindexG(ip-1) + SMPindexG(ip)!===

36 OMP-3 35 SMPindexG!$omp parallel do do ip= 1, PEsmpTOT do i= SMPindexG(ip-1)+1, SMPindexG(ip) ( )!$omp end parallel do ip=1 ip=2 ip=3 ip=4 ip=5 ip=6 ip=7 ip=8 ip=1 ip=2 ip=3 ip=4 ip=5 ip=6 ip=7 ip=8 for Dot-products, DAXPY, Mat-vec, and Poi-gen

37 OMP-3 36!!-- 1D array nn = IELTOT allocate (indexl(0:nn), indexu(0:nn)) indexl= 0 indexu= 0!=== do icel= 1, IELTOT indexl(icel)= INL(icel) indexu(icel)= INU(icel) do icel= 1, IELTOT indexl(icel)= indexl(icel) + indexl(icel-1) indexu(icel)= indexu(icel) + indexu(icel-1) NPL= indexl(ieltot) NPU= indexu(ieltot) allocate (iteml(npl), AL(NPL)) allocate (itemu(npu), AU(NPU)) iteml= 0 itemu= 0 AL= 0.d0 AU= 0.d0 poi_gen (5/9) New numbering is applied after this point Name Type ontent D(N) R Diagonal components of the matrix (N= IELTOT) BFORE(N) R RHS vector PHI(N) R Unknown vector indexl(0:n), indexu(0:n) I # of L/U non-zero off-diag. comp. (RS) NPL, NPU I Total # of L/U non-zero offdiag. comp. (RS) iteml(npl), itemu(npu) I olumn ID of L/U non-zero off-diag. comp. (RS) AL(NPL), AU(NPU) R L/U non-zero off-diag. comp. (RS)

38 OMP-3 37!! ! INTERIOR & NEUMANN BOUNDARY ELLs! !===!$omp parallel do private (ip,icel,ic0,icn1,icn2,icn3,icn4,icn5,icn6) &!$omp& private (VOL0,coef,j,ii,jj,kk) do ip = 1, PEsmpTOT do icel= SMPindexG(ip-1)+1, SMPindexG(ip) ic0 = NEWtoOLD(icel) icn1= NEIBcell(ic0,1) icn2= NEIBcell(ic0,2) icn3= NEIBcell(ic0,3) icn4= NEIBcell(ic0,4) icn5= NEIBcell(ic0,5) icn6= NEIBcell(ic0,6) VOL0= VOLEL (ic0) if (icn5.ne.0) then icn5= OLDtoNEW(icN5) coef= RDZ * ZAREA D(icel)= D(icel) - coef icel: New ID ic0: Old ID if (icn5.lt.icel) then do j= 1, INL(icel) if (IAL(j,icel).eq.icN5) then iteml(j+indexl(icel-1))= icn5 AL(j+indexL(icel-1))= coef exit else do j= 1, INU(icel) if (IAU(j,icel).eq.icN5) then itemu(j+indexu(icel-1))= icn5 AU(j+indexU(icel-1))= coef exit poi_gen (6/9) New numbering is applied neib( icel,1) neib( icel,2) neib( icel,3) x y neib( icel,4) y neib( icel,5) neib( icel,6) x z z icel icel icel icel icel icel yz yz zx zx xy xy f icel xyz

39 OMP-3 38 oef. Matrix: Parallel, SMPindexG private!! ! INTERIOR & NEUMANN BOUNDARY ELLs! !===!$omp parallel do private (ip,icel,ic0,icn1,icn2,icn3,icn4,icn5,icn6)!$omp& private (VOL0,coef,j,ii,jj,kk) & do ip = 1, PEsmpTOT do icel= SMPindexG(ip-1)+1, SMPindexG(ip) ic0 = NEWtoOLD(icel) icn1= NEIBcell(ic0,1) icn2= NEIBcell(ic0,2) icn3= NEIBcell(ic0,3) icn4= NEIBcell(ic0,4) icn5= NEIBcell(ic0,5) icn6= NEIBcell(ic0,6) VOL0= VOLEL (ic0)

40 OMP-3 39!! ! INTERIOR & NEUMANN BOUNDARY ELLs! !===!$omp parallel do private (ip,icel,ic0,icn1,icn2,icn3,icn4,icn5,icn6) &!$omp& private (VOL0,coef,j,ii,jj,kk) do ip = 1, PEsmpTOT do icel= SMPindexG(ip-1)+1, SMPindexG(ip) ic0 = NEWtoOLD(icel) icn1= NEIBcell(ic0,1) icn2= NEIBcell(ic0,2) icn3= NEIBcell(ic0,3) icn4= NEIBcell(ic0,4) icn5= NEIBcell(ic0,5) icn6= NEIBcell(ic0,6) VOL0= VOLEL (ic0) if (icn5.ne.0) then icn5= OLDtoNEW(icN5) coef= RDZ * ZAREA D(icel)= D(icel) - coef icel: New ID ic0: Old ID if (icn5.lt.icel) then do j= 1, INL(icel) if (IAL(j,icel).eq.icN5) then iteml(j+indexl(icel-1))= icn5 AL(j+indexL(icel-1))= coef exit else do j= 1, INU(icel) if (IAU(j,icel).eq.icN5) then itemu(j+indexu(icel-1))= icn5 AU(j+indexU(icel-1))= coef exit poi_gen (6/9) New numbering is applied neib( icel,1) neib( icel,2) neib( icel,3) x y neib( icel,4) y neib( icel,5) neib( icel,6) x z z icel icel icel icel icel icel yz yz zx zx xy xy f icel xyz

41 OMP-3 40!! ! INTERIOR & NEUMANN BOUNDARY ELLs! !===!$omp parallel do private (ip,icel,ic0,icn1,icn2,icn3,icn4,icn5,icn6) &!$omp& private (VOL0,coef,j,ii,jj,kk) do ip = 1, PEsmpTOT do icel= SMPindexG(ip-1)+1, SMPindexG(ip) ic0 = NEWtoOLD(icel) icn1= NEIBcell(ic0,1) icn2= NEIBcell(ic0,2) icn3= NEIBcell(ic0,3) icn4= NEIBcell(ic0,4) icn5= NEIBcell(ic0,5) icn6= NEIBcell(ic0,6) VOL0= VOLEL (ic0) if (icn5.ne.0) then icn5= OLDtoNEW(icN5) coef= RDZ * ZAREA D(icel)= D(icel) - coef if (icn5.lt.icel) then do j= 1, INL(icel) if (IAL(j,icel).eq.icN5) then iteml(j+indexl(icel-1))= icn5 AL(j+indexL(icel-1))= coef exit else do j= 1, INU(icel) if (IAU(j,icel).eq.icN5) then itemu(j+indexu(icel-1))= icn5 AU(j+indexU(icel-1))= coef exit poi_gen (6/9) New numbering is applied neib( icel,1) neib( icel,2) neib( icel,3) x y neib( icel,4) y neib( icel,5) neib( icel,6) x z z icel icel icel icel icel icel yz yz zx zx xy xy f icel xyz

42 OMP-3 41!! ! INTERIOR & NEUMANN BOUNDARY ELLs! !===!$omp parallel do private (ip,icel,ic0,icn1,icn2,icn3,icn4,icn5,icn6) &!$omp& private (VOL0,coef,j,ii,jj,kk) do ip = 1, PEsmpTOT do icel= SMPindexG(ip-1)+1, SMPindexG(ip) ic0 = NEWtoOLD(icel) icn1= NEIBcell(ic0,1) icn2= NEIBcell(ic0,2) icn3= NEIBcell(ic0,3) icn4= NEIBcell(ic0,4) icn5= NEIBcell(ic0,5) icn6= NEIBcell(ic0,6) VOL0= VOLEL (ic0) if (icn5.ne.0) then icn5= OLDtoNEW(icN5) coef= RDZ * ZAREA D(icel)= D(icel) - coef if (icn5.lt.icel) then do j= 1, INL(icel) if (IAL(j,icel).eq.icN5) then iteml(j+indexl(icel-1))= icn5 AL(j+indexL(icel-1))= coef exit else do j= 1, INU(icel) 1 RDZ z ZAREA xy if (IAU(j,icel).eq.icN5) then itemu(j+indexu(icel-1))= icn5 AU(j+indexU(icel-1))= coef exit icn5 < icel Lower Part poi_gen (6/9) New numbering is applied neib( icel,1) neib( icel,2) neib( icel,3) x y neib( icel,4) y neib( icel,5) neib( icel,6) x z z icel icel icel icel icel icel yz yz zx zx xy xy f icel xyz

43 OMP-3 42!! ! INTERIOR & NEUMANN BOUNDARY ELLs! !===!$omp parallel do private (ip,icel,ic0,icn1,icn2,icn3,icn4,icn5,icn6) &!$omp& private (VOL0,coef,j,ii,jj,kk) do ip = 1, PEsmpTOT do icel= SMPindexG(ip-1)+1, SMPindexG(ip) ic0 = NEWtoOLD(icel) icn1= NEIBcell(ic0,1) icn2= NEIBcell(ic0,2) icn3= NEIBcell(ic0,3) icn4= NEIBcell(ic0,4) icn5= NEIBcell(ic0,5) icn6= NEIBcell(ic0,6) VOL0= VOLEL (ic0) if (icn5.ne.0) then icn5= OLDtoNEW(icN5) coef= RDZ * ZAREA D(icel)= D(icel) - coef if (icn5.lt.icel) then do j= 1, INL(icel) if (IAL(j,icel).eq.icN5) then iteml(j+indexl(icel-1))= icn5 AL(j+indexL(icel-1))= coef exit else do j= 1, INU(icel) 1 RDZ z ZAREA xy if (IAU(j,icel).eq.icN5) then itemu(j+indexu(icel-1))= icn5 AU(j+indexU(icel-1))= coef exit icn5 > icel Upper Part poi_gen (6/9) New numbering is applied neib( icel,1) neib( icel,2) neib( icel,3) x y neib( icel,4) y neib( icel,5) neib( icel,6) x z z icel icel icel icel icel icel yz yz zx zx xy xy f icel xyz

44 OMP-3 43 if (icn3.ne.0) then icn3= OLDtoNEW(icN3) coef= RDY * YAREA D(icel)= D(icel) - coef if (icn3.lt.icel) then do j= 1, INL(icel) if (IAL(j,icel).eq.icN3) then iteml(j+indexl(icel-1))= icn3 AL(j+indexL(icel-1))= coef exit else do j= 1, INU(icel) if (IAU(j,icel).eq.icN3) then itemu(j+indexu(icel-1))= icn3 AU(j+indexU(icel-1))= coef exit if (icn1.ne.0) then icn1= OLDtoNEW(icN1) coef= RDX * XAREA D(icel)= D(icel) - coef if (icn1.lt.icel) then do j= 1, INL(icel) if (IAL(j,icel).eq.icN1) then iteml(j+indexl(icel-1))= icn1 AL(j+indexL(icel-1))= coef exit else do j= 1, INU(icel) if (IAU(j,icel).eq.icN1) then itemu(j+indexu(icel-1))= icn1 AU(j+indexU(icel-1))= coef exit poi_gen (7/9) neib( icel,1) neib( icel,2) neib( icel,3) x y neib( icel,4) y neib( icel,5) neib( icel,6) x z z icel icel icel icel icel icel yz yz zx zx xy xy f icel xyz

45 OMP-3 44 if (icn2.ne.0) then icn2= OLDtoNEW(icN2) coef= RDX * XAREA D(icel)= D(icel) - coef if (icn2.lt.icel) then do j= 1, INL(icel) if (IAL(j,icel).eq.icN2) then iteml(j+indexl(icel-1))= icn2 AL(j+indexL(icel-1))= coef exit else do j= 1, INU(icel) if (IAU(j,icel).eq.icN2) then itemu(j+indexu(icel-1))= icn2 AU(j+indexU(icel-1))= coef exit if (icn4.ne.0) then icn4= OLDtoNEW(icN4) coef= RDY * YAREA D(icel)= D(icel) - coef if (icn4.lt.icel) then do j= 1, INL(icel) if (IAL(j,icel).eq.icN4) then iteml(j+indexl(icel-1))= icn4 AL(j+indexL(icel-1))= coef exit else do j= 1, INU(icel) if (IAU(j,icel).eq.icN4) then itemu(j+indexu(icel-1))= icn4 AU(j+indexU(icel-1))= coef exit poi_gen (8/9) neib( icel,1) neib( icel,2) neib( icel,3) x y neib( icel,4) y neib( icel,5) neib( icel,6) x z z icel icel icel icel icel icel yz yz zx zx xy xy f icel xyz

46 OMP-3 45!$omp parallel do private (ip,icel,ic0,icn1,icn2,icn3,icn4,icn5,icn6) &!$omp& private (VOL0,coef,j,ii,jj,kk) poi_gen (9/9) if (icn6.ne.0) then icn6= OLDtoNEW(icN6) coef= RDZ * ZAREA D(icel)= D(icel) - coef if (icn6.lt.icel) then do j= 1, INL(icel) if (IAL(j,icel).eq.icN6) then iteml(j+indexl(icel-1))= icn6 AL(j+indexL(icel-1))= coef exit else do j= 1, INU(icel) if (IAU(j,icel).eq.icN6) then itemu(j+indexu(icel-1))= icn6 AU(j+indexU(icel-1))= coef exit ii= XYZ(ic0,1) jj= XYZ(ic0,2) kk= XYZ(ic0,3) BFORE(icel)= -dfloat(ii+jj+kk) * VOL0!$omp end parallel do!=== BFORE using original mesh ID ii,jj,kk,vol0: private neib( icel,1) neib( icel,2) neib( icel,3) x y neib( icel,4) y neib( icel,5) neib( icel,6) x z z icel icel icel icel icel icel yz yz zx zx xy xy f icel xyz

47 OMP-3 46 Main Program program MAIN use STRUT use PG use solver_ig_mc implicit REAL*8 (A-H,O-Z) real(kind=8), dimension(:), allocatable :: WK call INPUT call POINTER_INIT call BOUNDARY_ELL call ELL_METRIS call POI_GEN PHI= 0.d0 call solve_ig_mc & & ( IELTOT, NPL, NPU, indexl, iteml, indexu, itemu, D, & & BFORE, PHI, AL, AU, NOLORtot, PEsmpTOT, & & SMPindex, SMPindexG, EPSIG, ITR, IER)

48 OMP-3 47 solve_ig_mc (1/6)!***!*** module solver_ig_mc!***! module solver_ig_mc contains!!*** solve_ig! subroutine solve_ig_mc & & ( N, NPL, NPU, indexl, iteml, indexu, itemu, D, B, X, & & AL, AU, NOLORtot, PEsmpTOT, SMPindex, SMPindexG, & & EPS, ITR, IER) implicit REAL*8 (A-H,O-Z) integer :: N, NL, NU, NOLORtot, PEsmpTOT real(kind=8), dimension(n) :: D real(kind=8), dimension(n) :: B real(kind=8), dimension(n) :: X real(kind=8), dimension(npl) :: AL real(kind=8), dimension(npu) :: AU integer, dimension(0:n) :: indexl, indexu integer, dimension(npl):: iteml integer, dimension(npu):: itemu integer, dimension(0:nolortot*pesmptot):: SMPindex integer, dimension(0:pesmptot) :: SMPindexG real(kind=8), dimension(:,:), allocatable :: W integer, parameter :: R= 1 integer, parameter :: Z= 2 integer, parameter :: Q= 2 integer, parameter :: P= 3 integer, parameter :: DD= 4

49 OMP-3 48 solve_ig_mc (2/6)!! ! INIT! !=== allocate (W(N,4))!$omp parallel do private(ip,i) do ip= 1, PEsmpTOT do i = SMPindexG(ip-1)+1, SMPindexG(ip) X(i) = 0.d0 W(i,2)= 0.0D0 W(i,3)= 0.0D0 W(i,4)= 0.0D0!$omp end parallel do do ic= 1, NOLORtot!$omp parallel do private(ip,ip1,i,val,k) do ip= 1, PEsmpTOT ip1= (ic-1)*pesmptot + ip do i= SMPindex(ip1-1)+1, SMPindex(ip1) VAL= D(i) do k= indexl(i-1)+1, indexl(i) VAL= VAL - (AL(k)**2) * W(itemL(k),DD) W(i,DD)= 1.d0/VAL!$omp end parallel do Incomplete Modified holesky Factorization

50 OMP-3 49 Incomplete Modified holesky Factorization d i 1 i1 a ii a ii k ik dk l W(i,DD): D(i): IAL(j,i): AL(j,i): d i a ii k a ik do i= 1, N VAL= D(i) do k= indexl(i-1)+1, indexl(i) VAL= VAL - (AL(k)**2) * W(itemL(k),DD) W(i,DD)= 1.d0/VAL

51 OMP-3 50 Incomplete Modified holesky Factorization: Parallel Version d i 1 i1 a ii a ii k ik dk l W(i,DD): D(i): IAL(j,i): AL(j,i): d i a ii k a ik do ic= 1, NOLORtot!$omp parallel do private(ip,ip1,i,val,k) do ip= 1, PEsmpTOT ip1= (ic-1)*pesmptot + ip do i= SMPindex(ip1-1)+1, SMPindex(ip1) VAL= D(i) do k= indexl(i-1)+1, indexl(i) VAL= VAL - (AL(k)**2) * W(itemL(k),DD) W(i,DD)= 1.d0/VAL!$omp end parallel do

52 OMP-3 51 solve_ig_mc (3/6)! ! {r0}= {b} - [A]{xini}! !===!$omp parallel do private(ip,i,val,k) do ip= 1, PEsmpTOT do i = SMPindexG(ip-1)+1, SMPindexG(ip) VAL= D(i)*X(i) do k= indexl(i-1)+1, indexl(i) VAL= VAL + AL(k)*X(itemL(k)) do k= indexu(i-1)+1, indexu(i) VAL= VAL + AU(k)*X(itemU(k)) W(i,R)= B(i) - VAL!$omp end parallel do BNRM2= 0.0D0!$omp parallel do private(ip,i) reduction(+:bnrm2) do ip= 1, PEsmpTOT do i = SMPindexG(ip-1)+1, SMPindexG(ip) BNRM2 = BNRM2 + B(i) **2!$omp end parallel do!=== ompute r (0) = b-[a]x (0) for i= 1, 2, solve [M]z (i-1) = r (i-1) i-1 = r (i-1) z (i-1) if i=1 p (1) = z (0) else i-1 = i-1 / i-2 p (i) = z (i-1) + i-1 q (i) = [A]p (i) i = i-1 /p (i) q (i) x (i) = x (i-1) + i p (i) r (i) = r (i-1) - i q (i) check convergence r end p (i-1)

53 OMP-3 52 Mat-Vec NO Data Dependency: SMPindexG!$omp parallel do private(ip,i,val,k) do ip= 1, PEsmpTOT do i = SMPindexG(ip-1)+1, SMPindexG(ip) VAL= D(i)*X(i) do k= indexl(i-1)+1, indexl(i) VAL= VAL + AL(k)*X(itemL(k)) do k= indexu(i-1)+1, indexu(i) VAL= VAL + AU(k)*X(itemU(k)) W(i,R)= B(i) - VAL!$omp end parallel do

54 OMP-3 53 solve_ig_mc (3/6)! ! {r0}= {b} - [A]{xini}! !===!$omp parallel do private(ip,i,val,k) do ip= 1, PEsmpTOT do i = SMPindexG(ip-1)+1, SMPindexG(ip) VAL= D(i)*X(i) do k= indexl(i-1)+1, indexl(i) VAL= VAL + AL(k)*X(itemL(k)) do k= indexu(i-1)+1, indexu(i) VAL= VAL + AU(k)*X(itemU(k)) W(i,R)= B(i) - VAL!$omp end parallel do BNRM2= 0.0D0!$omp parallel do private(ip,i) reduction(+:bnrm2) do ip= 1, PEsmpTOT do i = SMPindexG(ip-1)+1, SMPindexG(ip) BNRM2 = BNRM2 + B(i) **2!$omp end parallel do!=== ompute r (0) = b-[a]x (0) for i= 1, 2, solve [M]z (i-1) = r (i-1) i-1 = r (i-1) z (i-1) if i=1 p (1) = z (0) else i-1 = i-1 / i-2 p (i) = z (i-1) + i-1 q (i) = [A]p (i) i = i-1 /p (i) q (i) x (i) = x (i-1) + i p (i) r (i) = r (i-1) - i q (i) check convergence r end p (i-1)

55 OMP-3 54 Dot Products: SMPindexG, reduction BNRM2= 0.0D0!$omp parallel do private(ip,i) reduction(+:bnrm2) do ip= 1, PEsmpTOT do i = SMPindexG(ip-1)+1, SMPindexG(ip) BNRM2 = BNRM2 + B(i) **2!$omp end parallel do

56 OMP-3 55 ITR= N do L= 1, ITR!! ! {z}= [Minv]{r}! !===!$omp parallel do private(ip,i) do ip= 1, PEsmpTOT do i = SMPindexG(ip-1)+1, SMPindexG(ip) W(i,Z)= W(i,R)!$omp end parallel do do ic= 1, NOLORtot!$omp parallel do private(ip,ip1,i,wval,j) do ip= 1, PEsmpTOT ip1= (ic-1)*pesmptot + ip do i= SMPindex(ip1-1)+1, SMPindex(ip1) WVAL= W(i,Z) do j= 1, INL(i) WVAL= WVAL - AL(j,i) * W(IAL(j,i),Z) W(i,Z)= WVAL * W(i,DD)!$omp end parallel do do ic= NOLORtot, 1, -1!$omp parallel do private(ip,ip1,i,sw,j) do ip= 1, PEsmpTOT ip1= (ic-1)*pesmptot + ip do i= SMPindex(ip1-1)+1, SMPindex(ip1) SW = 0.0d0 do j= 1, INU(i) SW= SW + AU(j,i) * W(IAU(j,i),Z) W(i,Z)= W(i,Z) - W(i,DD) * SW!$omp end parallel do!=== solve_ig_mc (4/6) ompute r (0) = b-[a]x (0) for i= 1, 2, solve [M]z (i-1) = r (i-1) i-1 = r (i-1) z (i-1) if i=1 p (1) = z (0) else i-1 = i-1 / i-2 p (i) = z (i-1) + i-1 q (i) = [A]p (i) i = i-1 /p (i) q (i) x (i) = x (i-1) + i p (i) r (i) = r (i-1) - i q (i) check convergence r end p (i-1)

57 OMP-3 56 ITR= N do L= 1, ITR!! ! {z}= [Minv]{r}! !===!$omp parallel do private(ip,i) do ip= 1, PEsmpTOT do i = SMPindexG(ip-1)+1, SMPindexG(ip) W(i,Z)= W(i,R)!$omp end parallel do SMPindex do ic= 1, NOLORtot!$omp parallel do private(ip,ip1,i,wval,k) do ip= 1, PEsmpTOT ip1= (ic-1)*pesmptot + ip do i= SMPindex(ip1-1)+1, SMPindex(ip1) WVAL= W(i,Z) do k= indexl(i-1)+1, indexl(i) WVAL= WVAL - AL(k) * W(itemL(k),Z) W(i,Z)= WVAL * W(i,DD)!$omp end parallel do do ic= NOLORtot, 1, -1!$omp parallel do private(ip,ip1,i,sw,k) do ip= 1, PEsmpTOT ip1= (ic-1)*pesmptot + ip do i= SMPindex(ip1-1)+1, SMPindex(ip1) SW = 0.0d0 do k= indexu(i-1)+1, indexu(i) SW= SW + AU(k) * W(itemU(k),Z) W(i,Z)= W(i,Z) - W(i,DD) * SW!$omp end parallel do!=== solve_ig_mc (4/6) ompute r (0) = b-[a]x (0) for i= 1, 2, solve [M]z (i-1) = r (i-1) i-1 = r (i-1) z (i-1) if i=1 p (1) = z (0) else i-1 = i-1 / i-2 p (i) = z (i-1) + i-1 q (i) = [A]p (i) i = i-1 /p (i) q (i) x (i) = x (i-1) + i p (i) r (i) = r (i-1) - i q (i) check convergence r end p (i-1)

58 OMP-3 57 ITR= N do L= 1, ITR!! ! {z}= [Minv]{r}! !===!$omp parallel do private(ip,i) do ip= 1, PEsmpTOT do i = SMPindexG(ip-1)+1, SMPindexG(ip) W(i,Z)= W(i,R)!$omp end parallel do do ic= 1, NOLORtot!$omp parallel do private(ip,ip1,i,wval,k) do ip= 1, PEsmpTOT ip1= (ic-1)*pesmptot + ip do i= SMPindex(ip1-1)+1, SMPindex(ip1) WVAL= W(i,Z) do k= indexl(i-1)+1, indexl(i) WVAL= WVAL - AL(k) * W(itemL(k),Z) W(i,Z)= WVAL * W(i,DD)!$omp end parallel do SMPindex do ic= NOLORtot, 1, -1!$omp parallel do private(ip,ip1,i,sw,k) do ip= 1, PEsmpTOT ip1= (ic-1)*pesmptot + ip do i= SMPindex(ip1-1)+1, SMPindex(ip1) SW = 0.0d0 do k= indexu(i-1)+1, indexu(i) SW= SW + AU(k) * W(itemU(k),Z) W(i,Z)= W(i,Z) - W(i,DD) * SW!$omp end parallel do!=== solve_ig_mc (4/6) T M z LDL z r Lz r Forward Substitution DL T z z Backward Substitution

59 OMP-3 58 Forward Substitution: SMPindex do ic= 1, NOLORtot!$omp parallel do private(ip,ip1,i,wval,k) do ip= 1, PEsmpTOT ip1= (ic-1)*pesmptot + ip do i= SMPindex(ip1-1)+1, SMPindex(ip1) WVAL= W(i,Z) do k= indexl(i-1)+1, indexl(i) WVAL= WVAL - AL(k) * W(indexL(k),Z) W(i,Z)= WVAL * W(i,DD)!$omp end parallel do

60 OMP-3 59! ! {p} = {z} if ITER=1! BETA= RHO / RHO1 otherwise! !=== if ( L.eq.1 ) then!$omp parallel do private(ip,i) do ip= 1, PEsmpTOT do i = SMPindexG(ip-1)+1, SMPindexG(ip) W(i,P)= W(i,Z)!$omp end parallel do else BETA= RHO / RHO1!$omp parallel do private(ip,i) do ip= 1, PEsmpTOT do i = SMPindexG(ip-1)+1, SMPindexG(ip) W(i,P)= W(i,Z) + BETA*W(i,P)!$omp end parallel do!===! ! {q}= [A]{p}! !===!$omp parallel do private(ip,i,val,k) do ip= 1, PEsmpTOT do i = SMPindexG(ip-1)+1, SMPindexG(ip) VAL= D(i)*W(i,P) do k= indexl(i-1)+1, indexl(i) VAL= VAL + AL(k)*W(itemL(k),P) do k= indexu(i-1)+1, indexu(i) VAL= VAL + AU(k)*W(itemU(k),P) W(i,Q)= VAL!$omp end parallel do!=== solve_ig_mc (5/6) ompute r (0) = b-[a]x (0) for i= 1, 2, solve [M]z (i-1) = r (i-1) i-1 = r (i-1) z (i-1) if i=1 p (1) = z (0) else i-1 = i-1 / i-2 p (i) = z (i-1) + i-1 q (i) = [A]p (i) i = i-1 /p (i) q (i) x (i) = x (i-1) + i p (i) r (i) = r (i-1) - i q (i) check convergence r end p (i-1)

61 OMP-3 60! ! {p} = {z} if ITER=1! BETA= RHO / RHO1 otherwise! !=== if ( L.eq.1 ) then!$omp parallel do private(ip,i) do ip= 1, PEsmpTOT do i = SMPindexG(ip-1)+1, SMPindexG(ip) W(i,P)= W(i,Z)!$omp end parallel do else BETA= RHO / RHO1!$omp parallel do private(ip,i) do ip= 1, PEsmpTOT do i = SMPindexG(ip-1)+1, SMPindexG(ip) W(i,P)= W(i,Z) + BETA*W(i,P)!$omp end parallel do!===! ! {q}= [A]{p}! !===!$omp parallel do private(ip,i,val,k) do ip= 1, PEsmpTOT do i = SMPindexG(ip-1)+1, SMPindexG(ip) VAL= D(i)*W(i,P) do k= indexl(i-1)+1, indexl(i) VAL= VAL + AL(k)*W(itemL(k),P) do k= indexu(i-1)+1, indexu(i) VAL= VAL + AU(k)*W(itemU(k),P) W(i,Q)= VAL!$omp end parallel do!=== solve_ig_mc (5/6) ompute r (0) = b-[a]x (0) for i= 1, 2, solve [M]z (i-1) = r (i-1) i-1 = r (i-1) z (i-1) if i=1 p (1) = z (0) else i-1 = i-1 / i-2 p (i) = z (i-1) + i-1 q (i) = [A]p (i) i = i-1 /p (i) q (i) x (i) = x (i-1) + i p (i) r (i) = r (i-1) - i q (i) check convergence r end p (i-1)

62 OMP-3 61!! ! ALPHA= RHO / {p}{q}! !=== 1= 0.d0!$omp parallel do private(ip,i) reduction(+:1) do ip= 1, PEsmpTOT do i = SMPindexG(ip-1)+1, SMPindexG(ip) 1= 1 + W(i,P)*W(i,Q)!$omp end parallel do!=== ALPHA= RHO / 1!! ! {x}= {x} + ALPHA*{p}! {r}= {r} - ALPHA*{q}! !===!$omp parallel do private(ip,i) do ip= 1, PEsmpTOT do i = SMPindexG(ip-1)+1, SMPindexG(ip) X(i) = X(i) + ALPHA * W(i,P) W(i,R)= W(i,R) - ALPHA * W(i,Q)!$omp end parallel do DNRM2= 0.d0!$omp parallel do private(ip,i) reduction(+:dnrm2) do ip= 1, PEsmpTOT do i = SMPindexG(ip-1)+1, SMPindexG(ip) DNRM2= DNRM2 + W(i,R)**2!$omp end parallel do!=== solve_ig_mc (6/6) ompute r (0) = b-[a]x (0) for i= 1, 2, solve [M]z (i-1) = r (i-1) i-1 = r (i-1) z (i-1) if i=1 p (1) = z (0) else i-1 = i-1 / i-2 p (i) = z (i-1) + i-1 q (i) = [A]p (i) i = i-1 /p (i) q (i) x (i) = x (i-1) + i p (i) r (i) = r (i-1) - i q (i) check convergence r end p (i-1)

63 OMP-3 62!! ! ALPHA= RHO / {p}{q}! !=== 1= 0.d0!$omp parallel do private(ip,i) reduction(+:1) do ip= 1, PEsmpTOT do i = SMPindexG(ip-1)+1, SMPindexG(ip) 1= 1 + W(i,P)*W(i,Q)!$omp end parallel do!=== ALPHA= RHO / 1!! ! {x}= {x} + ALPHA*{p}! {r}= {r} - ALPHA*{q}! !===!$omp parallel do private(ip,i) do ip= 1, PEsmpTOT do i = SMPindexG(ip-1)+1, SMPindexG(ip) X(i) = X(i) + ALPHA * W(i,P) W(i,R)= W(i,R) - ALPHA * W(i,Q)!$omp end parallel do DNRM2= 0.d0!$omp parallel do private(ip,i) reduction(+:dnrm2) do ip= 1, PEsmpTOT do i = SMPindexG(ip-1)+1, SMPindexG(ip) DNRM2= DNRM2 + W(i,R)**2!$omp end parallel do!=== solve_ig_mc (6/6) ompute r (0) = b-[a]x (0) for i= 1, 2, solve [M]z (i-1) = r (i-1) i-1 = r (i-1) z (i-1) if i=1 p (1) = z (0) else i-1 = i-1 / i-2 p (i) = z (i-1) + i-1 q (i) = [A]p (i) i = i-1 /p (i) q (i) x (i) = x (i-1) + i p (i) r (i) = r (i-1) - i q (i) check convergence r end p (i-1)

64 OMP-3 63!! ! ALPHA= RHO / {p}{q}! !=== 1= 0.d0!$omp parallel do private(ip,i) reduction(+:1) do ip= 1, PEsmpTOT do i = SMPindexG(ip-1)+1, SMPindexG(ip) 1= 1 + W(i,P)*W(i,Q)!$omp end parallel do!=== ALPHA= RHO / 1!! ! {x}= {x} + ALPHA*{p}! {r}= {r} - ALPHA*{q}! !===!$omp parallel do private(ip,i) do ip= 1, PEsmpTOT do i = SMPindexG(ip-1)+1, SMPindexG(ip) X(i) = X(i) + ALPHA * W(i,P) W(i,R)= W(i,R) - ALPHA * W(i,Q)!$omp end parallel do DNRM2= 0.d0!$omp parallel do private(ip,i) reduction(+:dnrm2) do ip= 1, PEsmpTOT do i = SMPindexG(ip-1)+1, SMPindexG(ip) DNRM2= DNRM2 + W(i,R)**2!$omp end parallel do!=== solve_ig_mc (6/6) ompute r (0) = b-[a]x (0) for i= 1, 2, solve [M]z (i-1) = r (i-1) i-1 = r (i-1) z (i-1) if i=1 p (1) = z (0) else i-1 = i-1 / i-2 p (i) = z (i-1) + i-1 q (i) = [A]p (i) i = i-1 /p (i) q (i) x (i) = x (i-1) + i p (i) r (i) = r (i-1) - i q (i) check convergence r end p (i-1)

65 OMP-3 64 Applying OpenMP to -sol Examples Optimization + Exercise

66 OMP-3 65 Results Hitachi SR11000/J2 1-node, 16-cores Meshes ore Memory L3 ore ore Memory L3 ore ore Memory L3 ore ore Memory L3 ore Memory ore L3 PU Memory ore L3 ore Memory ore L3 ore Memory ore L3 ore ore Memory L3 ore ore Memory L3 ore ore Memory L3 ore ore Memory L3 ore ore Memory L3 ore ore Memory L3 ore ore Memory L3 ore ore Memory L3 ore Memory ore L3 PU Memory ore L3 PU ore L3 PU Memory ore L3 ore Memory ore L3 ore ore L3 ore Memory ore L3 ore Memory ore L3 ore ore L3 ore Memory ore L3 ore Memory ore L3 ore ore L3 ore

67 OMP-3 66 SR11000, 1-node/16-cores, ( :M, :RM,-:M-RM) 1.E+06 ITERATIONS Incompatible Point # 1.E+04 1.E+02 Iterations IP# E+00 1.E+01 1.E+02 1.E+03 1.E+04 OLOR# 1.E+00 1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 OLOR# 3.00 Time (solver) 1.0E-02 Time/Iteration sec sec./iteration 8.0E E E+00 1.E+01 1.E+02 1.E+03 OLOR# 挙動おかしい 4.0E-03 1.E+00 1.E+01 1.E+02 1.E+03 OLOR#

68 OMP-3 67 FX10, 1-node/16-cores, ( :M, :RM,-:M-RM) Iterations M RM M-RM M RM M-RM OLOR# Iterations Time (solver) Number of Incompatible Nodes 1.E+06 1.E+05 1.E+04 1.E+03 1.E+02 1.E+01 M RM M-RM 1.E E-02 OLOR# M 3.00E-02 RM M-RM Time/ Iteration IP# sec sec./iteration 2.00E E E OLOR# OLOR#

69 OMP-3 68 Applying OpenMP to -sol Examples Optimization + Exercise

70 OMP-E 69 Running the ode Further Optimization Profiler, Analyzing ompile Lists

71 OMP-1 70 ompile & Run >$ cd <$O-L3>/src >$ make >$ ls../run/l3-sol L3-sol >$ cd../run >$ pjsub go1.sh

72 OMP-3 71 Running L3-sol L3-sol Poisson Solver FVM test.inp ParaView File INPUT.DAT ontrol File

73 OMP-3 72 ontrol Data: INPUT.DAT NX/NY/NZ 1.00e e e-00 DX/DY/DZ 1.0e-08 EPSIG 16 PEsmpTOT 100 NOLORtot NX,NY,NZ Number of meshes in X/Y/Z dir. DX,DY,DZ Size of meshes EPSIG onvergence riteria for IG PEsmpTOT Thread Number NOLORtot Reordering Method + Initial Number of olors/levels 2: M, =0: M, =-1: RM, -2 : MRM z NZ y x NX NY Z X Y

74 OMP-1 73 go1.sh #!/bin/sh #PJM -L "node=1" #PJM -L "elapse=00:10:00" #PJM -L "rscgrp=lecture" #PJM -g "gt71" #PJM -j #PJM -o test.lst export OMP_NUM_THREADS=16./L3-sol =PEsmpTOT

75 OMP-3 74 Results on FX10, 10 6 meshes Iterations: M(2): 333, RM(298-levels): 224, M-RM(Nc=20): 249 sec M=2 RM(298) M-RM(20) Speed-Up M=2 RM(298) M-RM(20) thread# thread# 16 threads M(2): 2.42 sec. M-RM(20): 2.01 sec. Memory

76 75 Exercise Various onfigurations Problem Size Number of Threads Number of olors, Reordering Method (M, RM, M- RM)

77 OMP-E 76 Running the ode Further Optimization OpenMP Statement Sequential Reordering ELL Profiler, Analyzing ompile Lists

78 OMP-E 77 Forward Subst.: urrent Impl. (F) do ic= 1, NOLORtot!$omp parallel do private(ip,ip1,i,wval,k) do ip= 1, PEsmpTOT ip1= (ic-1)*pesmptot + ip do i= SMPindex(ip1-1)+1, SMPindex(ip1) WVAL= W(i,Z) do k= indexl(i-1)+1, indexl(i) WVAL= WVAL - AL(k) * W(itemL(k),Z) W(i,Z)= WVAL * W(i,DD)!$omp end parallel do At!omp parallel, generation and corruption of threads (up to 16) occurs. In each color, this occurs Some overhead Overhead increases, if number of color increases.

79 OMP-E 78 Forward Subst.: urrent Impl. () for(ic=0; ic<nolortot; ic++) { #pragma omp parallel for private (ip, ip1, i, WVAL, j) for(ip=0; ip<pesmptot; ip++) { ip1 = ic * PEsmpTOT + ip; for(i=smpindex[ip1]; i<smpindex[ip1+1]; i++){ WVAL = W[Z][i]; for(j=indexl[i]; j<indexl[i+1]; j++){ WVAL -= AL[j] * W[Z][itemL[j]-1]; } W[Z][i] = WVAL * W[DD][i]; } } } At!omp parallel, generation and corruption of threads (up to 16) occurs. In each color, this occurs Some overhead Overhead increases, if number of color increases.

80 OMP-E 79 For. Subst.: Reduced Overhead (F)!$omp parallel private(ip,ip1,i,wval,k) do ic= 1, NOLORtot!$omp do do ip= 1, PEsmpTOT ip1= (ic-1)*pesmptot + ip do i= SMPindex(ip1-1)+1, SMPindex(ip1) WVAL= W(i,Z) do k= indexl(i-1)+1, indexl(i) WVAL= WVAL - AL(k) * W(itemL(k),Z) W(i,Z)= WVAL * W(i,DD) endd!$omp end parallel Generation of threads occurs just once before starting forward substitutions. Loops with!omp do are parallelized.

81 OMP-E 80 For. Subst.: Reduced Overhead () #pragma omp parallel private (ip, ip1, i, WVAL, j) for(ic=0; ic<nolortot; ic++) { #pragma omp for for(ip=0; ip<pesmptot; ip++) { ip1 = ic * PEsmpTOT + ip; for(i=smpindex[ip1]; i<smpindex[ip1+1]; i++){ WVAL = W[Z][i]; for(j=indexl[i]; j<indexl[i+1]; j++){ WVAL -= AL[j] * W[Z][itemL[j]-1]; } W[Z][i] = WVAL * W[DD][i]; } } } Generation of threads occurs just once before starting forward substitutions. Loops with!omp do are parallelized.

82 OMP-E 81 Programs % cd <$O-L3> % ls run reorder0 src src0 % cd src0 % make % cd../run % ls L3-sol0 L3-sol0 % <modify INPUT.DAT > % <modify go0.sh > % pjsub go0.sh

83 OMP-E 82 Results: L3-sol0 is better N=128 3 L3-sol L3-sol0 NOLORtot= -20 M-RM (20) 318 Iterations NOLORtot= -1 RM (382 levels) 287 Iterations 5.69 sec sec sec sec.

84 OMP-E 83 Running the ode Further Optimization OpenMP Statement Sequential Reordering ELL Profiler, Analyzing ompile Lists

85 OMP-3 84 Problems in Reordering oloring M RM M-RM Renumbering is according to color/level ID On each thread, numbering is not continuous reduced performance

86 OMP-3 85 SMPindex: for preconditioning do ic= 1, NOLORtot!$omp parallel do do ip= 1, PEsmpTOT ip1= (ic-1)*pesmptot+ip do i= SMPindex(ip1-1)+1, SMPindex(ip1) ( )!omp end parallel do Initial Vector oloring (5 colors) +Ordering color=1 color=2 color=3 color=4 color=5 color=1 color=2 color=3 color=4 color= colors, 8-threads Meshes in same color are independent: parallel processing Reordering in ascending order according to color ID

87 OMP-3 86 Sequential Reordering Reordering for continuous memory access on each thread (core) Performance is expected to be better. ontinuous address of arrays, such as coefficient matrices Locality (2-page later) Inconsistent numbering iteml(k) > icel indexl(icel-1)+1 k indexl(icel)

88 OMP-3 87 Sequential Reordering Further reordering for continuous memory access on each thread, 5-color, 8-threads Initial Vector oloring (5 colors) +Ordering color=1 color=2 color=3 color=4 color=5 color=1 color=2 color=3 color=4 color=5 oalesced Sequential

89 OMP-3 88 Sequential Reordering M-RM(2), 4-threads ontinuous Data Access on a Thread: Utilization of ache, Prefetching M-RM(2) Sequential Reordering, 4-threads

90 OMP-3 89 Sequential Reordering M-RM(2), 4-threads 1 st -olor #0 thread, #1, #2, # M-RM(2) Sequential Reordering, 4-threads

91 OMP-3 90 Sequential Reordering M-RM(2), 4-threads 2 nd -olor #0 thread, #1, #2, # M-RM(2) Sequential Reordering, 4-threads

92 OMP-3 91 Sequential Reordering oalesced Good for GPU oloring (5 colors) +Ordering Initial Vector color=1 color=2 color=3 color=4 color=5 color=1 color=2 color=3 color=4 color= Sequential Initial Vector oloring (5 colors) +Ordering color=1 color=2 color=3 color=4 color=5 各スレッド上で不連続なメモリアクセス ( 色の順に番号付け ) color=1 color=2 color=3 color=4 color= スレッド内で連続に番号付け

93 OMP-3 92 Files on FX10 Location <$O-L3>/src,<$O-L3>/run ompile/run Main Part cd <$O-L3>/reorder0 make <$O-L3>/run/L3-rsol0 (exec) ontrol Data <$O-L3>/run/INPUT.DAT Batch Job Script <$O-L3>/run/gor.sh

94 OMP-3 93 INPUT.DAT NX/NY/NZ 1.00e e e-00 DX/DY/DZ 1.0e-08 EPSI 16 PEsmpTOT 100 NOLORtot 0 NFLAG 0 METHOD PEsmpTOT Thread Number NOLORtot Reordering Method + Initial Number of olors/levels 2: M, =0: M, =-1: RM, -2 : MRM NFLAG =0: without first-touch, =1: with first-touch METHOD Loop structure for Mat-Vec =0: conventional way, =1: similar to forward/backward substitution

95 OMP-3 94 Sequential Reordering allocate (SMPindex(0:PEsmpTOT*NOLORtot)) SMPindex= 0 do ic= 1, NOLORtot nn1= OLORindex(ic) - OLORindex(ic-1) num= nn1 / PEsmpTOT nr = nn1 - PEsmpTOT*num do ip= 1, PEsmpTOT if (ip.le.nr) then SMPindex((ic-1)*PEsmpTOT+ip)= num + 1 else SMPindex((ic-1)*PEsmpTOT+ip)= num SMPindex ic= ic= ic= SMPindex_new ic= ic= allocate (SMPindex_new(0:PEsmpTOT*NOLORtot)) SMPindex_new(0)= 0 do ic= 1, NOLORtot do ip= 1, PEsmpTOT j1= (ic-1)*pesmptot + ip j0= j1-1 SMPindex_new((ip-1)*NOLORtot+ic)= SMPindex(j1) SMPindex(j1)= SMPindex(j0) + SMPindex(j1) do ip= 1, PEsmpTOT do ic= 1, NOLORtot j1= (ip-1)*nolortot + ic j0= j1-1 SMPindex_new(j1)= SMPindex_new(j0) + SMPindex_new(j1)

96 OMP-3 95 Mat-Vec: METHOD=0!$omp parallel do private(ip,i,val,k) do ip= 1, PEsmpTOT do i = SMPindexG(ip-1)+1, SMPindexG(ip) VAL= D(i)*W(i,P) do k= indexl(i-1)+1, indexl(i) VAL= VAL + AL(k)*W(itemL(k),P) do k= indexu(i-1)+1, indexu(i) VAL= VAL + AU(k)*W(itemU(k),P) W(i,Q)= VAL!$omp end parallel do Original!$omp parallel do private(ip,i,val,k) do ip= 1, PEsmpTOT do i= SMPindex((ip-1)*NOLORtot)+1, SMPindex(ip*NOLORtot) VAL= D(i)*W(i,P) do k= indexl(i-1)+1, indexl(i) VAL= VAL + AL(k)*W(itemL(k),P) do k= indexu(i-1)+1, indexu(i) VAL= VAL + AU(k)*W(itemU(k),P) W(i,Q)= VAL New!$omp end parallel do

97 OMP-3 96 Forward Substitution!$omp parallel private(ip,ip1,i,wval,k) do ic= 1, NOLORtot!$omp do do ip= 1, PEsmpTOT ip1= (ic-1)*pesmptot + ip do i= SMPindex(ip1-1)+1, SMPindex(ip1) WVAL= W(i,Z) do k= indexl(i-1)+1, indexl(i) WVAL= WVAL - AL(k) * W(itemL(k),Z) W(i,Z)= WVAL * W(i,DD)!$omp end parallel!$omp parallel private(ip,ip1,i,wval,k) do ic= 1, NOLORtot!$omp do do ip= 1, PEsmpTOT ip1= (ip-1)*nolortot + ic do i= SMPindex(ip1-1)+1, SMPindex(ip1) WVAL= W(i,Z) do k= indexl(i-1)+1, indexl(i) WVAL= WVAL - AL(k) * W(itemL(k),Z) W(i,Z)= WVAL * W(i,DD)!$omp end parallel Original New

98 OMP-3 97 Matrix Storage Format ELL (Ellpack-Itpack): Fixed Loop Length, Good for Prefetching (a) RS (b) ELL

99 OMP-3 98 ases: meshes 並列化向け色付け oloring 手法 Further Reordering 番号付け First Touch Data Placement 係数行列格納 Matrix Storage 形式 Format src0 reorder0 ELL ase-1 ase-2 ase-3 M-RM oalesced ( 図 4(a)) Sequential ( 図 4(b)) 無し NO 有り YES RS ELL oalesced Sequential Initial Vector Initial Vector oloring (5 colors) +Ordering color=1 color=2 color=3 color=4 color=5 oloring (5 colors) +Ordering color=1 color=2 color=3 color=4 color=5 color=1 color=2 color=3 color=4 color= 各スレッド上で不連続なメモリアクセス ( 色の順に番号付け ) color=1 color=2 color=3 color=4 color= スレッド内で連続に番号付け

100 OMP-3 99 olor# ~ Iteration M-RM Iterations OLOR#

101 OMP Results: FX10 ASE-1(src0) ASE- 2(reorder0) Slightly improved when number of colors are larger Generally speaking, performance is getting worse if number of colors increases In ASE-2, data on each thread is continuous, when computation proceeds to the next color. First Touch: NO effect ELL: Big effect sec ase-1 ase-2 ase OLOR# ase-1: src0 ase-2: reorder0 ase-3: reorder0 + ELL

102 OMP-3 Fujitsu FX10: ASE-1, M-RM(2) -dem.-miss:25.6%, Mem. throughput:41.8gb/sec. Forward/Backward Substitution E+00 [ 秒 ] 整数ロードメモリアクセス待ち 浮動小数点ロードメモリアクセス待ち ストア待ち 整数ロードキャッシュアクセス待ち 浮動小数点ロードキャッシュアクセス待ち 整数演算待ち 浮動小数点演算待ち 分岐命令待ち 命令フェッチ待ち バリア同期待ち uopコミット その他の待ち 1 命令コミット 整数レジスタ書き込み制約 2/3 命令コミット 4 命令コミット 3.5E E E E E E E E+00 Thread 0 Thread 1 Thread 2 Thread 3 Thread 4 Thread 5 Thread 6 Thread 7 Thread 8 Thread 9 Thread 10 Thread 11 Thread 12 Thread 13 Thread 14 Thread 15 src0: RS, oalesced

103 OMP Fujitsu FX10: ASE-2, M-RM(2) 25.6%, 41.8GB/sec. 4.0E+00 [ 秒 ] 整数ロードメモリアクセス待ち 浮動小数点ロードメモリアクセス待ち ストア待ち 整数ロードキャッシュアクセス待ち 浮動小数点ロードキャッシュアクセス待ち 整数演算待ち 浮動小数点演算待ち 分岐命令待ち 命令フェッチ待ち バリア同期待ち uopコミット その他の待ち 1 命令コミット 整数レジスタ書き込み制約 2/3 命令コミット 4 命令コミット 3.5E E E E E E E E+00 Thread 0 Thread 1 Thread 2 Thread 3 Thread 4 Thread 5 Thread 6 Thread 7 Thread 8 Thread 9 Thread 10 Thread 11 Thread 12 Thread 13 Thread 14 Thread 15 reorder0: RS, Sequential

104 OMP Fujitsu FX10: ASE-1, M-RM(382) 37.7%, 28.7GB/sec. 4.0E+00 [ 秒 ] 整数ロードメモリアクセス待ち 浮動小数点ロードメモリアクセス待ち ストア待ち 整数ロードキャッシュアクセス待ち 浮動小数点ロードキャッシュアクセス待ち 整数演算待ち 浮動小数点演算待ち 分岐命令待ち 命令フェッチ待ち バリア同期待ち uopコミット その他の待ち 1 命令コミット 整数レジスタ書き込み制約 2/3 命令コミット 4 命令コミット 3.5E E E E E E E E+00 Thread 0 Thread 1 Thread 2 Thread 3 Thread 4 Thread 5 Thread 6 Thread 7 Thread 8 Thread 9 Thread 10 Thread 11 Thread 12 Thread 13 Thread 14 Thread 15 src0: RS, oalesced

105 OMP Fujitsu FX10: ASE-2, M-RM(382) 29.3%, 32.6GB/sec. 4.0E+00 [ 秒 ] 整数ロードメモリアクセス待ち 浮動小数点ロードメモリアクセス待ち ストア待ち 整数ロードキャッシュアクセス待ち 浮動小数点ロードキャッシュアクセス待ち 整数演算待ち 浮動小数点演算待ち 分岐命令待ち 命令フェッチ待ち バリア同期待ち uopコミット その他の待ち 1 命令コミット 整数レジスタ書き込み制約 2/3 命令コミット 4 命令コミット 3.5E E E E E E E E+00 Thread 0 Thread 1 Thread 2 Thread 3 Thread 4 Thread 5 Thread 6 Thread 7 Thread 8 Thread 9 Thread 10 Thread 11 Thread 12 Thread 13 Thread 14 Thread 15 reorder0: RS, Sequential

106 OMP Summary: Fujitsu FX10 Analysis by Profiler Upper: Demand Miss Rate Lower: Memory Throughput src0 ASE-1 RS+ oalesced reorder0 ASE-2 RS+ Sequential ASE-3 ELL+ Sequential M-RM(2) M-RM(382) 25.5 % 25.6 % 5.42 % 41.8 GB/sec GB/sec GB/sec % 29.3 % 16.5 % 28.7 GB/sec GB/sec GB/sec.

107 OMP Summary: Fujitsu FX10 Analysis by Profiler Upper: M-RM(20), Lower: M-RM(382) ase-2 RS ase-3 ELL Instructions SIMD(%) Memory Access Throughput(%) ase-1: src0 ase-2: reorder0 ase-3: reorder0 + ELL

108 OMP Results: ray XE6 ASE-1(src0) ASE-2(reorder0) Significant Improvement Optimization for NUMA Architecture + First Touch RS ELL Improvement is not so large sec OLOR# ase-1 ase-3 ase-1: src0 ase-2: reorder0 ase-3: reorder0 + ELL ase-2

109 OMP L3 Memory Memory L3 L3 Memory Memory L3 L3 Memory L3 Memory L3 Memory L3 Memory Memory T2K/Tokyo ray XE6 (Hopper) Fujitsu FX10 (Oakleaf-FX)

110 OMP Summary Fujitsu FX10 ray XE6 M-RM(20) M-RM(382) = RM 計算時 Time 間 (sec.) ( 秒 ) 一反復当たり計算 (sec.) 時間 ( 秒 ) Time/Iteration 計算時 Time 間 (sec.) ( 秒 ) 一反復当たり計算 (sec.) 時間 ( 秒 ) Time/Iteration ase ase ase ase ase ase ase-1: src0 ase-2: reorder0 ase-3: reorder0 + ELL

111 OMP Fujitsu FX10: ASE-3, M-RM(2) 5.4%, 64.0GB/sec. 4.0E+00 [ 秒 ] 整数ロードメモリアクセス待ち 浮動小数点ロードメモリアクセス待ち ストア待ち 整数ロードキャッシュアクセス待ち 浮動小数点ロードキャッシュアクセス待ち 整数演算待ち 浮動小数点演算待ち 分岐命令待ち 命令フェッチ待ち バリア同期待ち uopコミット その他の待ち 1 命令コミット 整数レジスタ書き込み制約 2/3 命令コミット 4 命令コミット 3.5E E E E E E E E+00 Thread 0 Thread 1 Thread 2 Thread 3 Thread 4 Thread 5 Thread 6 Thread 7 Thread 8 Thread 9 Thread 10 Thread 11 Thread 12 Thread 13 Thread 14 Thread 15 ELL, Sequential

112 OMP Fujitsu FX10: ASE-3, M-RM(382) 16.5%, 52.2GB/sec. 4.0E+00 [ 秒 ] 整数ロードメモリアクセス待ち 浮動小数点ロードメモリアクセス待ち ストア待ち 整数ロードキャッシュアクセス待ち 浮動小数点ロードキャッシュアクセス待ち 整数演算待ち 浮動小数点演算待ち 分岐命令待ち 命令フェッチ待ち バリア同期待ち uopコミット その他の待ち 1 命令コミット 整数レジスタ書き込み制約 2/3 命令コミット 4 命令コミット 3.5E E E E E E E E+00 Thread 0 Thread 1 Thread 2 Thread 3 Thread 4 Thread 5 Thread 6 Thread 7 Thread 8 Thread 9 Thread 10 Thread 11 Thread 12 Thread 13 Thread 14 Thread 15 ELL, Sequential

113 OMP Running the ode Further Optimization Profiler, Analyzing ompile Lists 利用支援ポータル ドキュメント閲覧 プログラム開発支援ツール プロファイラ使用手引書 3 章 : 詳細プロファイラ Users Portal Document Programming Development Support Tool Profiler User s Guide hap.3 Advanced Profiler

114 113 Default >$ cd <$O-L3>/src >$ make >$ ls../run/l3-sol L3-sol >$ cd../run >$ pjsub go1.sh F90 = frtpx F90OPTFLAGS= -Kfast,openmp -Qt F90FLAGS =$(F90OPTFLAGS) ompile & Run -Qt List of Messages by ompiler (ompile List) *.lst Fortran Only In, -Qt is not avilable Please use -Nsrc Displayed on screen

115 114 urrent version of /++ compiler can produce list of messages Fortran//++ -Nlst=p 標準の最適化情報 ( デフォルト ) -Nlst=t 詳細な最適化情報 Fortran ONLY -Nlst=a 名前の属性情報 -Nlst=d 派生型の構成情報 -Nlst=i インクルードされたファイルのプログラムリストおよびインクルードファイル名一覧 -Nlst=m 自動並列化の状況を OpenMP 指示文によって表現した原始プログラム出力 -Nlst=x 名前および文番号の相互参照情報

116 Info in *.lst 115

117 SIMD Information 116

118 Automatic Parallelization 117

OpenMP/OpenACC によるマルチコア メニィコア並列プログラミング入門 Fortran 編第 Ⅳ 部 :OpenMP による並列化 + 演習 中島研吾 東京大学情報基盤センター

OpenMP/OpenACC によるマルチコア メニィコア並列プログラミング入門 Fortran 編第 Ⅳ 部 :OpenMP による並列化 + 演習 中島研吾 東京大学情報基盤センター OpenMP/OpenACC によるマルチコア メニィコア並列プログラミング入門 Fortran 編第 Ⅳ 部 :OpenMP による並列化 + 演習 中島研吾 東京大学情報基盤センター OMP-3 1 OpenMP 並列化 L2-sol を OpenMP によって並列化する 並列化にあたってはスレッド数を PEsmpTOT によってプログラム内で調節できる方法を適用する 基本方針 同じ 色 ( または

More information

Microsoft PowerPoint - omp-02.ppt

Microsoft PowerPoint - omp-02.ppt 科学技術計算のための マルチコアプログラミング入門第 Ⅱ 部 : オーダリング 2009 年 9 月 14 日 15 日中島研吾 2009-09-14/15 2 データ依存性の解決策は? オーダリング (Ordering) について Red-Black,Multicolor(MC) Cuthill-McKee(CM),Reverse-CM(RCM) オーダリングと収束の関係 オーダリングの実装 オーダリング付

More information

OpenACCによる並列化

OpenACCによる並列化 実習 OpenACC による ICCG ソルバーの並列化 1 ログイン Reedbush へのログイン $ ssh reedbush.cc.u-tokyo.ac.jp l txxxxx Module のロード $ module load pgi/17.3 cuda ログインするたびに必要です! ワークディレクトリに移動 $ cdw ターゲットプログラム /srcx OpenACC 用のディレクトリの作成

More information

Microsoft PowerPoint - omp-c-02.ppt [互換モード]

Microsoft PowerPoint - omp-c-02.ppt [互換モード] 科学技術計算のための マルチコアプログラミング入門 C 言語編第 Ⅱ 部 : オーダリング 中島研吾 東京大学情報基盤センター 2 データ依存性の解決策は? オーダリング (Ordering) について Red-Black,Multicolor(MC) Cuthill-McKee(CM),Reverse-CM(RCM) オーダリングと収束の関係 オーダリングの実装 オーダリング付 ICCG 法の実装

More information

GeoFEM開発の経験から

GeoFEM開発の経験から FrontISTR における並列計算のしくみ < 領域分割に基づく並列 FEM> メッシュ分割 領域分割 領域分割 ( パーティショニングツール ) 全体制御 解析制御 メッシュ hecmw_ctrl.dat 境界条件 材料物性 計算制御パラメータ 可視化パラメータ 領域分割ツール 逐次計算 並列計算 Front ISTR FEM の主な演算 FrontISTR における並列計算のしくみ < 領域分割に基づく並列

More information

4.1 % 7.5 %

4.1 % 7.5 % 2018 (412837) 4.1 % 7.5 % Abstract Recently, various methods for improving computial performance have been proposed. One of these various methods is Multi-core. Multi-core can execute processes in parallel

More information

OpenMP/OpenACC によるマルチコア メニィコア並列プログラミング入門 Fortran 編第 Ⅱ 部 :OpenMP 中島研吾 東京大学情報基盤センター

OpenMP/OpenACC によるマルチコア メニィコア並列プログラミング入門 Fortran 編第 Ⅱ 部 :OpenMP 中島研吾 東京大学情報基盤センター OpenMP/OpenACC によるマルチコア メニィコア並列プログラミング入門 Fortran 編第 Ⅱ 部 :OpenMP 中島研吾 東京大学情報基盤センター 2 OpenMP Login to Reedbush-U Parallel Version of the Code by OpenMP STREAM Data Dependency 3 Hybrid 並列プログラミング スレッド並列 +

More information

OpenMP¤òÍѤ¤¤¿ÊÂÎó·×»»¡Ê£±¡Ë

OpenMP¤òÍѤ¤¤¿ÊÂÎó·×»»¡Ê£±¡Ë 2012 5 24 scalar Open MP Hello World Do (omp do) (omp workshare) (shared, private) π (reduction) PU PU PU 2 16 OpenMP FORTRAN/C/C++ MPI OpenMP 1997 FORTRAN Ver. 1.0 API 1998 C/C++ Ver. 1.0 API 2000 FORTRAN

More information

OpenMP¤òÍѤ¤¤¿ÊÂÎó·×»»¡Ê£±¡Ë

OpenMP¤òÍѤ¤¤¿ÊÂÎó·×»»¡Ê£±¡Ë 2011 5 26 scalar Open MP Hello World Do (omp do) (omp workshare) (shared, private) π (reduction) scalar magny-cours, 48 scalar scalar 1 % scp. ssh / authorized keys 133. 30. 112. 246 2 48 % ssh 133.30.112.246

More information

OpenMP¤òÍѤ¤¤¿ÊÂÎó·×»»¡Ê£²¡Ë

OpenMP¤òÍѤ¤¤¿ÊÂÎó·×»»¡Ê£²¡Ë 2013 5 30 (schedule) (omp sections) (omp single, omp master) (barrier, critical, atomic) program pi i m p l i c i t none integer, parameter : : SP = kind ( 1. 0 ) integer, parameter : : DP = selected real

More information

A Feasibility Study of Direct-Mapping-Type Parallel Processing Method to Solve Linear Equations in Load Flow Calculations Hiroaki Inayoshi, Non-member

A Feasibility Study of Direct-Mapping-Type Parallel Processing Method to Solve Linear Equations in Load Flow Calculations Hiroaki Inayoshi, Non-member A Feasibility Study of Direct-Mapping-Type Parallel Processing Method to Solve Linear Equations in Load Flow Calculations Hiroaki Inayoshi, Non-member (University of Tsukuba), Yasuharu Ohsawa, Member (Kobe

More information

I I / 47

I I / 47 1 2013.07.18 1 I 2013 3 I 2013.07.18 1 / 47 A Flat MPI B 1 2 C: 2 I 2013.07.18 2 / 47 I 2013.07.18 3 / 47 #PJM -L "rscgrp=small" π-computer small: 12 large: 84 school: 24 84 16 = 1344 small school small

More information

Microsoft PowerPoint - GeoFEM.ppt [互換モード]

Microsoft PowerPoint - GeoFEM.ppt [互換モード] 三次元並列有限要素法への OpenMP/MPI ハイブリッド 並列プログラミングモデル適用 中島研吾東京大学情報基盤センター RIKEN AICS Spring School 2014 2 Hybrid 並列プログラミング スレッド並列 + メッセージパッシング OpenMP+ MPI CUDA + MPI, OpenACC + MPI 個人的には自動並列化 +MPI のことを ハイブリッド とは呼んでほしくない

More information

25 II :30 16:00 (1),. Do not open this problem booklet until the start of the examination is announced. (2) 3.. Answer the following 3 proble

25 II :30 16:00 (1),. Do not open this problem booklet until the start of the examination is announced. (2) 3.. Answer the following 3 proble 25 II 25 2 6 13:30 16:00 (1),. Do not open this problem boolet until the start of the examination is announced. (2) 3.. Answer the following 3 problems. Use the designated answer sheet for each problem.

More information

. (.8.). t + t m ü(t + t) + c u(t + t) + k u(t + t) = f(t + t) () m ü f. () c u k u t + t u Taylor t 3 u(t + t) = u(t) + t! u(t) + ( t)! = u(t) + t u(

. (.8.). t + t m ü(t + t) + c u(t + t) + k u(t + t) = f(t + t) () m ü f. () c u k u t + t u Taylor t 3 u(t + t) = u(t) + t! u(t) + ( t)! = u(t) + t u( 3 8. (.8.)............................................................................................3.............................................4 Nermark β..........................................

More information

XcalableMP入門

XcalableMP入門 XcalableMP 1 HPC-Phys@, 2018 8 22 XcalableMP XMP XMP Lattice QCD!2 XMP MPI MPI!3 XMP 1/2 PCXMP MPI Fortran CCoarray C++ MPIMPI XMP OpenMP http://xcalablemp.org!4 XMP 2/2 SPMD (Single Program Multiple Data)

More information

インテル(R) Visual Fortran Composer XE 2013 Windows版 入門ガイド

インテル(R) Visual Fortran Composer XE 2013 Windows版 入門ガイド Visual Fortran Composer XE 2013 Windows* エクセルソフト株式会社 www.xlsoft.com Rev. 1.1 (2012/12/10) Copyright 1998-2013 XLsoft Corporation. All Rights Reserved. 1 / 53 ... 3... 4... 4... 5 Visual Studio... 9...

More information

16.16%

16.16% 2017 (411824) 16.16% Abstract Multi-core processor is common technique for high computing performance. In many multi-core processor architectures, all processors share L2 and last level cache memory. Thus,

More information

1 # include < stdio.h> 2 # include < string.h> 3 4 int main (){ 5 char str [222]; 6 scanf ("%s", str ); 7 int n= strlen ( str ); 8 for ( int i=n -2; i

1 # include < stdio.h> 2 # include < string.h> 3 4 int main (){ 5 char str [222]; 6 scanf (%s, str ); 7 int n= strlen ( str ); 8 for ( int i=n -2; i ABC066 / ARC077 writer: nuip 2017 7 1 For International Readers: English editorial starts from page 8. A : ringring a + b b + c a + c a, b, c a + b + c 1 # include < stdio.h> 2 3 int main (){ 4 int a,

More information

課題 S1 解説 Fortran 編 中島研吾 東京大学情報基盤センター

課題 S1 解説 Fortran 編 中島研吾 東京大学情報基盤センター 課題 S1 解説 Fortran 編 中島研吾 東京大学情報基盤センター 内容 課題 S1 /a1.0~a1.3, /a2.0~a2.3 から局所ベクトル情報を読み込み, 全体ベクトルのノルム ( x ) を求めるプログラムを作成する (S1-1) file.f,file2.f をそれぞれ参考にする 下記の数値積分の結果を台形公式によって求めるプログラムを作成する

More information

untitled

untitled Fortran90 ( ) 17 12 29 1 Fortran90 Fortran90 FORTRAN77 Fortran90 1 Fortran90 module 1.1 Windows Windows UNIX Cygwin (http://www.cygwin.com) C\: Install Cygwin f77 emacs latex ps2eps dvips Fortran90 Intel

More information

120802_MPI.ppt

120802_MPI.ppt CPU CPU CPU CPU CPU SMP Symmetric MultiProcessing CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CP OpenMP MPI MPI CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU MPI MPI+OpenMP CPU CPU CPU CPU CPU CPU CPU CP

More information

openmp1_Yaguchi_version_170530

openmp1_Yaguchi_version_170530 並列計算とは /OpenMP の初歩 (1) 今 の内容 なぜ並列計算が必要か? スーパーコンピュータの性能動向 1ExaFLOPS 次世代スハ コン 京 1PFLOPS 性能 1TFLOPS 1GFLOPS スカラー機ベクトル機ベクトル並列機並列機 X-MP ncube2 CRAY-1 S-810 SR8000 VPP500 CM-5 ASCI-5 ASCI-4 S3800 T3E-900 SR2201

More information

AtCoder Regular Contest 073 Editorial Kohei Morita(yosupo) A: Shiritori if python3 a, b, c = input().split() if a[len(a)-1] == b[0] and b[len(

AtCoder Regular Contest 073 Editorial Kohei Morita(yosupo) A: Shiritori if python3 a, b, c = input().split() if a[len(a)-1] == b[0] and b[len( AtCoder Regular Contest 073 Editorial Kohei Morita(yosupo) 29 4 29 A: Shiritori if python3 a, b, c = input().split() if a[len(a)-1] == b[0] and b[len(b)-1] == c[0]: print( YES ) else: print( NO ) 1 B:

More information

Introduction Purpose This training course demonstrates the use of the High-performance Embedded Workshop (HEW), a key tool for developing software for

Introduction Purpose This training course demonstrates the use of the High-performance Embedded Workshop (HEW), a key tool for developing software for Introduction Purpose This training course demonstrates the use of the High-performance Embedded Workshop (HEW), a key tool for developing software for embedded systems that use microcontrollers (MCUs)

More information

Microsoft Word - 03-数値計算の基礎.docx

Microsoft Word - 03-数値計算の基礎.docx δx f x 0 + δ x n=0 a n = f ( n) ( x 0 ) n δx n f x x=0 sin x = x x3 3 + x5 5 x7 7 +... x ( ) = a n δ x n ( ) = sin x ak = (-mod(k,2))**(k/2) / fact_k 10 11 I = f x dx a ΔS = f ( x)h I = f a h I = h b (

More information

スパコンに通じる並列プログラミングの基礎

スパコンに通じる並列プログラミングの基礎 2018.06.04 2018.06.04 1 / 62 2018.06.04 2 / 62 Windows, Mac Unix 0444-J 2018.06.04 3 / 62 Part I Unix GUI CUI: Unix, Windows, Mac OS Part II 2018.06.04 4 / 62 0444-J ( : ) 6 4 ( ) 6 5 * 6 19 SX-ACE * 6

More information

Microsoft Word - Meta70_Preferences.doc

Microsoft Word - Meta70_Preferences.doc Image Windows Preferences Edit, Preferences MetaMorph, MetaVue Image Windows Preferences Edit, Preferences Image Windows Preferences 1. Windows Image Placement: Acquire Overlay at Top Left Corner: 1 Acquire

More information

スパコンに通じる並列プログラミングの基礎

スパコンに通じる並列プログラミングの基礎 2018.09.10 furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 1 / 59 furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 2 / 59 Windows, Mac Unix 0444-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 3 / 59 Part I Unix GUI CUI:

More information

Microsoft PowerPoint - S1-ref-F.ppt [互換モード]

Microsoft PowerPoint - S1-ref-F.ppt [互換モード] 課題 S1 解説 Fortran 言語編 RIKEN AICS HPC Summer School 2014 中島研吾 ( 東大 情報基盤センター ) 横川三津夫 ( 神戸大 計算科学教育センター ) MPI Programming 課題 S1 (1/2) /a1.0~a1.3, /a2.0~a2.3 から局所ベクトル情報を読み込み, 全体ベクトルのノルム ( x ) を求めるプログラムを作成する

More information

1F90/kouhou_hf90.dvi

1F90/kouhou_hf90.dvi Fortran90 3 33 1 2 Fortran90 FORTRAN 1956 IBM IBM704 FORTRAN(FORmula TRANslation ) 1965 FORTRAN66 1978 FORTRAN77 1991 Fortran90 Fortran90 Fortran Fortran90 6 Fortran90 77 90 90 Fortran90 [ ] Fortran90

More information

CPU Levels in the memory hierarchy Level 1 Level 2... Increasing distance from the CPU in access time Level n Size of the memory at each level 1: 2.2

CPU Levels in the memory hierarchy Level 1 Level 2... Increasing distance from the CPU in access time Level n Size of the memory at each level 1: 2.2 FFT 1 Fourier fast Fourier transform FFT FFT FFT 1 FFT FFT 2 Fourier 2.1 Fourier FFT Fourier discrete Fourier transform DFT DFT n 1 y k = j=0 x j ω jk n, 0 k n 1 (1) x j y k ω n = e 2πi/n i = 1 (1) n DFT

More information

Microsoft PowerPoint - 06-S2-ref-F.pptx

Microsoft PowerPoint - 06-S2-ref-F.pptx 並列有限要素法による 一次元定常熱伝導解析プログラム Fortran 編 中島研吾東京大学情報基盤センター お試しアカウント付き講習会 MPI 応用編 : 並列有限要素法 S2-ref 2 問題の概要, 実行方法 プログラムの説明 計算例 FEM1D 3 対象とする問題 : 一次元熱伝導問題 体積当たり一様発熱 Q x T x Q 0 x=0 (x min ) x= x max 一様な : 断面積

More information

01_OpenMP_osx.indd

01_OpenMP_osx.indd OpenMP* / 1 1... 2 2... 3 3... 5 4... 7 5... 9 5.1... 9 5.2 OpenMP* API... 13 6... 17 7... 19 / 4 1 2 C/C++ OpenMP* 3 Fortran OpenMP* 4 PC 1 1 9.0 Linux* Windows* Xeon Itanium OS 1 2 2 WEB OS OS OS 1 OS

More information

/ SCHEDULE /06/07(Tue) / Basic of Programming /06/09(Thu) / Fundamental structures /06/14(Tue) / Memory Management /06/1

/ SCHEDULE /06/07(Tue) / Basic of Programming /06/09(Thu) / Fundamental structures /06/14(Tue) / Memory Management /06/1 I117 II I117 PROGRAMMING PRACTICE II 2 MEMORY MANAGEMENT 2 Research Center for Advanced Computing Infrastructure (RCACI) / Yasuhiro Ohara yasu@jaist.ac.jp / SCHEDULE 1. 2011/06/07(Tue) / Basic of Programming

More information

2 A I / 58

2 A I / 58 2 A 2018.07.12 I 2 2018.07.12 1 / 58 I 2 2018.07.12 2 / 58 π-computer gnuplot 5/31 1 π-computer -X ssh π-computer gnuplot I 2 2018.07.12 3 / 58 gnuplot> gnuplot> plot sin(x) I 2 2018.07.12 4 / 58 cp -r

More information

syspro-0405.ppt

syspro-0405.ppt 3 4, 5 1 UNIX csh 2.1 bash X Window 2 grep l POSIX * more POSIX 3 UNIX. 4 first.sh #!bin/sh #first.sh #This file looks through all the files in the current #directory for the string yamada, and then prints

More information

main.dvi

main.dvi 1 F77 5 hmogi-2008f@kiban.civil.saitama-u.ac.jp 2013/5/13 1 2 f77... f77.exe f77.exe CDROM (CDROM D D: setupond E E: setupone 5 C:work\T66160\20130422>f77 menseki.f -o menseki f77(.exe) f77 f77(.exe) C:work\T66160\20130422>set

More information

スパコンに通じる並列プログラミングの基礎

スパコンに通じる並列プログラミングの基礎 2016.06.06 2016.06.06 1 / 60 2016.06.06 2 / 60 Windows, Mac Unix 0444-J 2016.06.06 3 / 60 Part I Unix GUI CUI: Unix, Windows, Mac OS Part II 0444-J 2016.06.06 4 / 60 ( : ) 6 6 ( ) 6 10 6 16 SX-ACE 6 17

More information

joho09.ppt

joho09.ppt s M B e E s: (+ or -) M: B: (=2) e: E: ax 2 + bx + c = 0 y = ax 2 + bx + c x a, b y +/- [a, b] a, b y (a+b) / 2 1-2 1-3 x 1 A a, b y 1. 2. a, b 3. for Loop (b-a)/ 4. y=a*x*x + b*x + c 5. y==0.0 y (y2)

More information

EGunGPU

EGunGPU Super Computing in Accelerator simulations - Electron Gun simulation using GPGPU - K. Ohmi, KEK-Accel Accelerator Physics seminar 2009.11.19 Super computers in KEK HITACHI SR11000 POWER5 16 24GB 16 134GFlops,

More information

5 11 3 1....1 2. 5...4 (1)...5...6...7...17...22 (2)...70...71...72...77...82 (3)...85...86...87...92...97 (4)...101...102...103...112...117 (5)...121...122...123...125...128 1. 10 Web Web WG 5 4 5 ²

More information

はじめに

はじめに IT 1 NPO (IPEC) 55.7 29.5 Web TOEIC Nice to meet you. How are you doing? 1 type (2002 5 )66 15 1 IT Java (IZUMA, Tsuyuki) James Robinson James James James Oh, YOU are Tsuyuki! Finally, huh? What's going

More information

Platypus-QM β ( )

Platypus-QM β ( ) Platypus-QM β (2012.11.12) 1 1 1.1...................................... 1 1.1.1...................................... 1 1.1.2................................... 1 1.1.3..........................................

More information

Microsoft PowerPoint - 10-omp.ppt [互換モード]

Microsoft PowerPoint - 10-omp.ppt [互換モード] OpenMP+ ハイブリッド並列化 中島研吾 東京大学情報基盤センター 2 Hybrid 並列プログラミング スレッド並列 + メッセージパッシング OpenMP+ MPI UDA + MPI, OpenA + MPI 個人的には自動並列化 +MPI のことを ハイブリッド とは呼んでほしくない 自動並列化に頼るのは危険である 東大センターでは現在自動並列化機能はコンパイラの要件にしていない ( 調達時に加点すらしない

More information

~~~~~~~~~~~~~~~~~~ wait Call CPU time 1, latch: library cache 7, latch: library cache lock 4, job scheduler co

~~~~~~~~~~~~~~~~~~ wait Call CPU time 1, latch: library cache 7, latch: library cache lock 4, job scheduler co 072 DB Magazine 2007 September ~~~~~~~~~~~~~~~~~~ wait Call CPU time 1,055 34.7 latch: library cache 7,278 750 103 24.7 latch: library cache lock 4,194 465 111 15.3 job scheduler coordinator slave wait

More information

L C -6D Z3 L C -0D Z3 3 4 5 6 7 8 9 10 11 1 13 14 15 16 17 OIL CLINIC BAR 18 19 POWER TIMER SENSOR 0 3 1 3 1 POWER TIMER SENSOR 3 4 1 POWER TIMER SENSOR 5 11 00 6 7 1 3 4 5 8 9 30 1 3 31 1 3 1 011 1

More information

How to read the marks and remarks used in this parts book. Section 1 : Explanation of Code Use In MRK Column OO : Interchangeable between the new part

How to read the marks and remarks used in this parts book. Section 1 : Explanation of Code Use In MRK Column OO : Interchangeable between the new part Reservdelskatalog MIKASA MT65H vibratorstamp EPOX Maskin AB Postadress Besöksadress Telefon Fax e-post Hemsida Version Box 6060 Landsvägen 1 08-754 71 60 08-754 81 00 info@epox.se www.epox.se 1,0 192 06

More information

How to read the marks and remarks used in this parts book. Section 1 : Explanation of Code Use In MRK Column OO : Interchangeable between the new part

How to read the marks and remarks used in this parts book. Section 1 : Explanation of Code Use In MRK Column OO : Interchangeable between the new part Reservdelskatalog MIKASA MVB-85 rullvibrator EPOX Maskin AB Postadress Besöksadress Telefon Fax e-post Hemsida Version Box 6060 Landsvägen 1 08-754 71 60 08-754 81 00 info@epox.se www.epox.se 1,0 192 06

More information

演習1: 演習準備

演習1: 演習準備 演習 1: 演習準備 2013 年 8 月 6 日神戸大学大学院システム情報学研究科森下浩二 1 演習 1 の内容 神戸大 X10(π-omputer) について システム概要 ログイン方法 コンパイルとジョブ実行方法 OpenMP の演習 ( 入門編 ) 1. parallel 構文 実行時ライブラリ関数 2. ループ構文 3. shared 節 private 節 4. reduction 節

More information

3. :, c, ν. 4. Burgers : u t + c u x = ν 2 u x 2, (3), ν. 5. : u t + u u x = ν 2 u x 2, (4), c. 2 u t 2 = c2 2 u x 2, (5) (1) (4), (1 Navier Stokes,.,

3. :, c, ν. 4. Burgers : u t + c u x = ν 2 u x 2, (3), ν. 5. : u t + u u x = ν 2 u x 2, (4), c. 2 u t 2 = c2 2 u x 2, (5) (1) (4), (1 Navier Stokes,., B:,, 2017 12 1, 8, 15, 22 1,.,,,,.,.,,,., 1,. 1. :, ν. 2. : u t = ν 2 u x 2, (1), c. u t + c u x = 0, (2), ( ). 1 3. :, c, ν. 4. Burgers : u t + c u x = ν 2 u x 2, (3), ν. 5. : u t + u u x = ν 2 u x 2,

More information

How to read the marks and remarks used in this parts book. Section 1 : Explanation of Code Use In MRK Column OO : Interchangeable between the new part

How to read the marks and remarks used in this parts book. Section 1 : Explanation of Code Use In MRK Column OO : Interchangeable between the new part Reservdelskatalog MIKASA MVC-50 vibratorplatta EPOX Maskin AB Postadress Besöksadress Telefon Fax e-post Hemsida Version Box 6060 Landsvägen 1 08-754 71 60 08-754 81 00 info@epox.se www.epox.se 1,0 192

More information

all.dvi

all.dvi fortran 1996 4 18 2007 6 11 2012 11 12 1 3 1.1..................................... 3 1.2.............................. 3 2 fortran I 5 2.1 write................................ 5 2.2.................................

More information

2012年度HPCサマーセミナー_多田野.pptx

2012年度HPCサマーセミナー_多田野.pptx ! CCS HPC! I " tadano@cs.tsukuba.ac.jp" " 1 " " " " " " " 2 3 " " Ax = b" " " 4 Ax = b" A = a 11 a 12... a 1n a 21 a 22... a 2n...... a n1 a n2... a nn, x = x 1 x 2. x n, b = b 1 b 2. b n " " 5 Gauss LU

More information

How to read the marks and remarks used in this parts book. Section 1 : Explanation of Code Use In MRK Column OO : Interchangeable between the new part

How to read the marks and remarks used in this parts book. Section 1 : Explanation of Code Use In MRK Column OO : Interchangeable between the new part Reservdelskatalog MIKASA MCD-L14 asfalt- och betongsåg EPOX Maskin AB Postadress Besöksadress Telefon Fax e-post Hemsida Version Box 6060 Landsvägen 1 08-754 71 60 08-754 81 00 info@epox.se www.epox.se

More information

2 I I / 61

2 I I / 61 2 I 2017.07.13 I 2 2017.07.13 1 / 61 I 2 2017.07.13 2 / 61 I 2 2017.07.13 3 / 61 7/13 2 7/20 I 7/27 II I 2 2017.07.13 4 / 61 π-computer gnuplot MobaXterm Wiki PC X11 DISPLAY I 2 2017.07.13 5 / 61 Mac 1.

More information

OpenMP (1) 1, 12 1 UNIX (FUJITSU GP7000F model 900), 13 1 (COMPAQ GS320) FUJITSU VPP5000/64 1 (a) (b) 1: ( 1(a))

OpenMP (1) 1, 12 1 UNIX (FUJITSU GP7000F model 900), 13 1 (COMPAQ GS320) FUJITSU VPP5000/64 1 (a) (b) 1: ( 1(a)) OpenMP (1) 1, 12 1 UNIX (FUJITSU GP7000F model 900), 13 1 (COMPAQ GS320) FUJITSU VPP5000/64 1 (a) (b) 1: ( 1(a)) E-mail: {nanri,amano}@cc.kyushu-u.ac.jp 1 ( ) 1. VPP Fortran[6] HPF[3] VPP Fortran 2. MPI[5]

More information

Building a Culture of Self- Access Learning at a Japanese University An Action Research Project Clair Taylor Gerald Talandis Jr. Michael Stout Keiko Omura Problem Action Research English Central Spring,

More information

Microsoft PowerPoint - 11-omp.pptx

Microsoft PowerPoint - 11-omp.pptx 並列有限要素法による 三次元定常熱伝導解析プログラム OpenMP+ ハイブリッド並列化 中島研吾東京大学情報基盤センター 2 Hybrid 並列プログラミング スレッド並列 + メッセージパッシング OpenMP+ MPI UDA + MPI, OpenA + MPI 個人的には自動並列化 +MPI のことを ハイブリッド とは呼んでほしくない 自動並列化に頼るのは危険である 東大センターでは現在自動並列化機能はコンパイラの要件にしていない

More information

Microsoft Word - Win-Outlook.docx

Microsoft Word - Win-Outlook.docx Microsoft Office Outlook での設定方法 (IMAP および POP 編 ) How to set up with Microsoft Office Outlook (IMAP and POP) 0. 事前に https://office365.iii.kyushu-u.ac.jp/login からサインインし 以下の手順で自分の基本アドレスをメモしておいてください Sign

More information

I N S T R U M E N T A T I O N & E L E C T R I C A L E Q U I P M E N T Pressure-resistant gasket type retreat method effective bulk compressibility Fro

I N S T R U M E N T A T I O N & E L E C T R I C A L E Q U I P M E N T Pressure-resistant gasket type retreat method effective bulk compressibility Fro Cable Gland This is the s to use for Cable Wiring in the hazardous location. It is much easier to install and maintenance and modification compared with Conduit Wiring with Sealing Fitting. The Standard

More information

3. :, c, ν. 4. Burgers : t + c x = ν 2 u x 2, (3), ν. 5. : t + u x = ν 2 u x 2, (4), c. 2 u t 2 = c2 2 u x 2, (5) (1) (4), (1 Navier Stokes,., ν. t +

3. :, c, ν. 4. Burgers : t + c x = ν 2 u x 2, (3), ν. 5. : t + u x = ν 2 u x 2, (4), c. 2 u t 2 = c2 2 u x 2, (5) (1) (4), (1 Navier Stokes,., ν. t + B: 2016 12 2, 9, 16, 2017 1 6 1,.,,,,.,.,,,., 1,. 1. :, ν. 2. : t = ν 2 u x 2, (1), c. t + c x = 0, (2). e-mail: iwayama@kobe-u.ac.jp,. 1 3. :, c, ν. 4. Burgers : t + c x = ν 2 u x 2, (3), ν. 5. : t +

More information

AN 100: ISPを使用するためのガイドライン

AN 100: ISPを使用するためのガイドライン ISP AN 100: In-System Programmability Guidelines 1998 8 ver.1.01 Application Note 100 ISP Altera Corporation Page 1 A-AN-100-01.01/J VCCINT VCCINT VCCINT Page 2 Altera Corporation IEEE Std. 1149.1 TCK

More information

n 2 n (Dynamic Programming : DP) (Genetic Algorithm : GA) 2 i

n 2 n (Dynamic Programming : DP) (Genetic Algorithm : GA) 2 i 15 Comparison and Evaluation of Dynamic Programming and Genetic Algorithm for a Knapsack Problem 1040277 2004 2 25 n 2 n (Dynamic Programming : DP) (Genetic Algorithm : GA) 2 i Abstract Comparison and

More information

Introduction Purpose This training course describes the configuration and session features of the High-performance Embedded Workshop (HEW), a key tool

Introduction Purpose This training course describes the configuration and session features of the High-performance Embedded Workshop (HEW), a key tool Introduction Purpose This training course describes the configuration and session features of the High-performance Embedded Workshop (HEW), a key tool for developing software for embedded systems that

More information

Evoltion of onentration by Eler method (Dirihlet) Evoltion of onentration by Eler method (Nemann).2 t n =.4n.2 t n =.4n : t n

Evoltion of onentration by Eler method (Dirihlet) Evoltion of onentration by Eler method (Nemann).2 t n =.4n.2 t n =.4n : t n 5 t = = (, y, z) t (, y, z, t) t = κ (68) κ [, ] (, ) = ( ) A ( /2)2 ep, A =., t =.. (69) 4πκt 4κt = /2 (, t) = for ( =, ) (Dirihlet ondition) (7) = for ( =, ) (Nemann ondition) (7) (68) (, t) = ( ) (

More information

1 OpenCL OpenCL 1 OpenCL GPU ( ) 1 OpenCL Compute Units Elements OpenCL OpenCL SPMD (Single-Program, Multiple-Data) SPMD OpenCL work-item work-group N

1 OpenCL OpenCL 1 OpenCL GPU ( ) 1 OpenCL Compute Units Elements OpenCL OpenCL SPMD (Single-Program, Multiple-Data) SPMD OpenCL work-item work-group N GPU 1 1 2 1, 3 2, 3 (Graphics Unit: GPU) GPU GPU GPU Evaluation of GPU Computing Based on An Automatic Program Generation Technology Makoto Sugawara, 1 Katsuto Sato, 1 Kazuhiko Komatsu, 2 Hiroyuki Takizawa

More information

2

2 8 24 32C800037C800042C8000 32 40 45 54 2 3 24 40 10 11 54 4 7 54 30 26 7 9 8 5 6 7 9 8 18 7 7 7 40 10 13 12 24 22 22 8 55 8 8 8 8 1 2 3 18 11 54 54 19 24 30 69 31 40 57 23 23 22 23 22 57 8 9 30 12 12 56

More information

2

2 p1 i 2 = 1 i 2 x, y x + iy 2 (x + iy) + (γ + iδ) = (x + γ) + i(y + δ) (x + iy)(γ + iδ) = (xγ yδ) + i(xδ + yγ) i 2 = 1 γ + iδ 0 x + iy γ + iδ xγ + yδ xδ = γ 2 + iyγ + δ2 γ 2 + δ 2 p7 = x 2 +y 2 z z p13

More information

How to read the marks and remarks used in this parts book. Section 1 : Explanation of Code Use In MRK Column OO : Interchangeable between the new part

How to read the marks and remarks used in this parts book. Section 1 : Explanation of Code Use In MRK Column OO : Interchangeable between the new part Reservdelskatalog MIKASA MVC-88 vibratorplatta EPOX Maskin AB Postadress Besöksadress Telefon Fax e-post Hemsida Version Box 6060 Landsvägen 1 08-754 71 60 08-754 81 00 info@epox.se www.epox.se 1,0 192

More information

fx-9860G Manager PLUS_J

fx-9860G Manager PLUS_J fx-9860g J fx-9860g Manager PLUS http://edu.casio.jp k 1 k III 2 3 1. 2. 4 3. 4. 5 1. 2. 3. 4. 5. 1. 6 7 k 8 k 9 k 10 k 11 k k k 12 k k k 1 2 3 4 5 6 1 2 3 4 5 6 13 k 1 2 3 1 2 3 1 2 3 1 2 3 14 k a j.+-(),m1

More information

2

2 8 23 26A800032A8000 31 37 42 51 2 3 23 37 10 11 51 4 26 7 28 7 8 7 9 8 5 6 7 9 8 17 7 7 7 37 10 13 12 23 21 21 8 53 8 8 8 8 1 2 3 17 11 51 51 18 23 29 69 30 39 22 22 22 22 21 56 8 9 12 53 12 56 43 35 27

More information

2

2 8 22 19A800022A8000 30 37 42 49 2 3 22 37 10 11 49 4 24 27 7 49 7 8 7 9 8 5 6 7 9 8 16 7 7 7 37 10 11 20 22 20 20 8 51 8 8 9 17 1 2 3 16 11 49 49 17 22 28 48 29 33 21 21 21 21 20 8 10 9 28 9 53 37 36 25

More information

ii

ii ii iii 1 1 1.1..................................... 1 1.2................................... 3 1.3........................... 4 2 9 2.1.................................. 9 2.2...............................

More information

L3 Japanese (90570) 2008

L3 Japanese (90570) 2008 90570-CDT-08-L3Japanese page 1 of 15 NCEA LEVEL 3: Japanese CD TRANSCRIPT 2008 90570: Listen to and understand complex spoken Japanese in less familiar contexts New Zealand Qualifications Authority: NCEA

More information

A Higher Weissenberg Number Analysis of Die-swell Flow of Viscoelastic Fluids Using a Decoupled Finite Element Method Iwata, Shuichi * 1/Aragaki, Tsut

A Higher Weissenberg Number Analysis of Die-swell Flow of Viscoelastic Fluids Using a Decoupled Finite Element Method Iwata, Shuichi * 1/Aragaki, Tsut A Higher Weissenberg Number Analysis of Die-swell Flow of Viscoelastic Fluids Using a Decoupled Finite Element Method Iwata, Shuichi * 1/Aragaki, Tsutomu * 1/Mori, Hideki * 1 Ishikawa, Satoshi * 1/Shin,

More information

GPGPU

GPGPU GPGPU 2013 1008 2015 1 23 Abstract In recent years, with the advance of microscope technology, the alive cells have been able to observe. On the other hand, from the standpoint of image processing, the

More information

(2-1) x, m, 2 N(m, 2 ) x REAL*8 FUNCTION NRMDST (X, M, V) X,M,V REAL*8 x, m, 2 X X N(0,1) f(x) standard-norm.txt normdist1.f x=0, 0.31, 0.5

(2-1) x, m, 2 N(m, 2 ) x REAL*8 FUNCTION NRMDST (X, M, V) X,M,V REAL*8 x, m, 2 X X N(0,1) f(x) standard-norm.txt normdist1.f x=0, 0.31, 0.5 2007/5/14 II II agata@k.u-tokyo.a.jp 0. 1. x i x i 1 x i x i x i x x+dx f(x)dx f(x) f(x) + 0 f ( x) dx = 1 (Probability Density Funtion 2 ) (normal distribution) 3 1 2 2 ( x m) / 2σ f ( x) = e 2πσ x m

More information

OpenACC

OpenACC 109 OpenMP/OpenACC, hoshino @ cc.u-tokyo.ac.jp nakajima @ cc.u-tokyo.ac.jp 1 n Reedbush n $ ssh -Y reedbush.cc.u-tokyo.ac.jp l txxxxx n module n $ module load pgi/18.7 # n n $ cdw n OpenACC_samples n $

More information

listings-ext

listings-ext (6) Python (2) ( ) ohsaki@kwansei.ac.jp 5 Python (2) 1 5.1 (statement)........................... 1 5.2 (scope)......................... 11 5.3 (subroutine).................... 14 5 Python (2) Python 5.1

More information

Z7000操作編_本文.indb

Z7000操作編_本文.indb 2 8 17 37Z700042Z7000 46Z7000 28 42 52 61 72 87 2 3 12 13 6 7 3 4 11 21 34 61 8 17 4 11 4 53 12 12 10 75 18 12 42 42 13 30 42 42 42 42 10 62 66 44 55 14 25 9 62 65 23 72 23 19 24 42 8 26 8 9 9 4 11 18

More information

(Basic Theory of Information Processing) Fortran Fortan Fortan Fortan 1

(Basic Theory of Information Processing) Fortran Fortan Fortan Fortan 1 (Basic Theory of Information Processing) Fortran Fortan Fortan Fortan 1 17 Fortran Formular Tranlator Lapack Fortran FORTRAN, FORTRAN66, FORTRAN77, FORTRAN90, FORTRAN95 17.1 A Z ( ) 0 9, _, =, +, -, *,

More information

Page 1 of 6 B (The World of Mathematics) November 20, 2006 Final Exam 2006 Division: ID#: Name: 1. p, q, r (Let p, q, r are propositions. ) (10pts) (a

Page 1 of 6 B (The World of Mathematics) November 20, 2006 Final Exam 2006 Division: ID#: Name: 1. p, q, r (Let p, q, r are propositions. ) (10pts) (a Page 1 of 6 B (The World of Mathematics) November 0, 006 Final Exam 006 Division: ID#: Name: 1. p, q, r (Let p, q, r are propositions. ) (a) (Decide whether the following holds by completing the truth

More information

Microsoft Word - 資料 (テイラー級数と数値積分).docx

Microsoft Word - 資料 (テイラー級数と数値積分).docx δx δx n x=0 sin x = x x3 3 + x5 5 x7 7 +... x ak = (-mod(k,2))**(k/2) / fact_k ( ) = a n δ x n f x 0 + δ x a n = f ( n) ( x 0 ) n f ( x) = sin x n=0 58 I = b a ( ) f x dx ΔS = f ( x)h I = f a h h I = h

More information

11042 計算機言語7回目 サポートページ:

11042 計算機言語7回目  サポートページ: 11042 7 :https://goo.gl/678wgm November 27, 2017 10/2 1(print, ) 10/16 2(2, ) 10/23 (3 ) 10/31( ),11/6 (4 ) 11/13,, 1 (5 6 ) 11/20,, 2 (5 6 ) 11/27 (7 12/4 (9 ) 12/11 1 (10 ) 12/18 2 (10 ) 12/25 3 (11

More information

2 3

2 3 RR-XR330 C Matsushita Electric Industrial Co., Ltd.2001 2 3 4 + - 5 6 1 2 3 2 1-3 + + - 22 +- 7 22 8 9 1 2 1 2 1 2 3 12 4 1 2 5 12 1 1 2 3 1 2 1 2 10 11 1 2 $% 1 1 2 34 2 % 3 % 1 2 1 2 3 1 2 12 13 1 2

More information

プラズマ核融合学会誌5月号【81-5】/内外情報_ソフト【注:欧フォント特殊!】

プラズマ核融合学会誌5月号【81-5】/内外情報_ソフト【注:欧フォント特殊!】 PROGRAM PLOTDATA USE NUM_KINDS, ONLY : wp=>dp, i4b USE MYLIB, ONLY : GET_SIZE, GET_DATA INTEGER(i4b) :: ntime, nx REAL(wp), ALLOCATABLE :: time(:), x(:), Temp(:,:) Fortran Temp, temp, TEMP temporal REAL(wp)

More information

1 u t = au (finite difference) u t = au Von Neumann

1 u t = au (finite difference) u t = au Von Neumann 1 u t = au 3 1.1 (finite difference)............................. 3 1.2 u t = au.................................. 3 1.3 Von Neumann............... 5 1.4 Von Neumann............... 6 1.5............................

More information

科学技術計算のための マルチコアプログラミング入門第 Ⅰ 部 : 概要, 対象アプリケーション,OpenMP 中島研吾 大島聡史 林雅江 東京大学情報基盤センター

科学技術計算のための マルチコアプログラミング入門第 Ⅰ 部 : 概要, 対象アプリケーション,OpenMP 中島研吾 大島聡史 林雅江 東京大学情報基盤センター 科学技術計算のための マルチコアプログラミング入門第 Ⅰ 部 : 概要, 対象アプリケーション,OpenMP 中島研吾 大島聡史 林雅江 東京大学情報基盤センター OMP- 本セミナーの背景 マイクロプロセッサのマルチコア化, メニーコア化 低消費電力, 様々なプログラミングモデル OpenMP 指示行 ( ディレクティヴ ) を挿入するだけで手軽に 並列化 ができるため, 広く使用されている 様々な解説書

More information

,4) 1 P% P%P=2.5 5%!%! (1) = (2) l l Figure 1 A compilation flow of the proposing sampling based architecture simulation

,4) 1 P% P%P=2.5 5%!%! (1) = (2) l l Figure 1 A compilation flow of the proposing sampling based architecture simulation 1 1 1 1 SPEC CPU 2000 EQUAKE 1.6 50 500 A Parallelizing Compiler Cooperative Multicore Architecture Simulator with Changeover Mechanism of Simulation Modes GAKUHO TAGUCHI 1 YOUICHI ABE 1 KEIJI KIMURA 1

More information

1 1.1 (JCPRG) 30 Nuclear Reaction Data File (NRDF) PC GSYS2.4 JCPRG GSYS2.4 Java Windows, Linux, Max OS X, FreeBSD GUI PNG, GIF, JPEG X Y GSYS2

1 1.1 (JCPRG) 30 Nuclear Reaction Data File (NRDF) PC GSYS2.4 JCPRG GSYS2.4 Java Windows, Linux, Max OS X, FreeBSD GUI PNG, GIF, JPEG X Y GSYS2 (GSYS2.4) GSYS2.4 Manual SUZUKI Ryusuke Hokkaido University Hospital Abstract GSYS2.4 is an update version of GSYS version 2. Main features added in this version are Magnifying glass function, Automatically

More information

ohp03.dvi

ohp03.dvi 19 3 ( ) 2019.4.20 CS 1 (comand line arguments) Unix./a.out aa bbb ccc ( ) C main void int main(int argc, char *argv[]) {... 2 (2) argc argv argc ( ) argv (C char ) ( 1) argc 4 argv NULL. / a. o u t \0

More information

Express5800/R110a-1Hユーザーズガイド

Express5800/R110a-1Hユーザーズガイド 4 Phoenix BIOS 4.0 Release 6.0.XXXX : CPU=Xeon Processor XXX MHz 0640K System RAM Passed 0127M Extended RAM Passed WARNING 0B60: DIMM group #1 has been disabled. : Press to resume, to

More information

浜松医科大学紀要

浜松医科大学紀要 On the Statistical Bias Found in the Horse Racing Data (1) Akio NODA Mathematics Abstract: The purpose of the present paper is to report what type of statistical bias the author has found in the horse

More information

2

2 L C -24K 9 L C -22K 9 2 3 4 5 6 7 8 9 10 11 12 11 03 AM 04 05 0 PM 1 06 1 PM 07 00 00 08 2 PM 00 4 PM 011 011 021 041 061 081 051 071 1 2 4 6 8 5 7 00 00 00 00 00 00 00 00 30 00 09 00 15 10 3 PM 45 00

More information

+ -

+ - i i C Matsushita Electric Industrial Co., Ltd.2001 -S F0901KK0 seconds ANTI-SKIP SYSTEM Portable CD player Operating Instructions -S + - + - 9 BATTERY CARRYING CASE K 3 - + 2 1 OP 2 + 3 - K K http://www.baj.or.jp

More information

Introduction Purpose This course explains how to use Mapview, a utility program for the Highperformance Embedded Workshop (HEW) development environmen

Introduction Purpose This course explains how to use Mapview, a utility program for the Highperformance Embedded Workshop (HEW) development environmen Introduction Purpose This course explains how to use Mapview, a utility program for the Highperformance Embedded Workshop (HEW) development environment for microcontrollers (MCUs) from Renesas Technology

More information

,,,,., C Java,,.,,.,., ,,.,, i

,,,,., C Java,,.,,.,., ,,.,, i 24 Development of the programming s learning tool for children be derived from maze 1130353 2013 3 1 ,,,,., C Java,,.,,.,., 1 6 1 2.,,.,, i Abstract Development of the programming s learning tool for children

More information

1 [1, 2, 3, 4, 5, 8, 9, 10, 12, 15] The Boston Public Schools system, BPS (Deferred Acceptance system, DA) (Top Trading Cycles system, TTC) cf. [13] [

1 [1, 2, 3, 4, 5, 8, 9, 10, 12, 15] The Boston Public Schools system, BPS (Deferred Acceptance system, DA) (Top Trading Cycles system, TTC) cf. [13] [ Vol.2, No.x, April 2015, pp.xx-xx ISSN xxxx-xxxx 2015 4 30 2015 5 25 253-8550 1100 Tel 0467-53-2111( ) Fax 0467-54-3734 http://www.bunkyo.ac.jp/faculty/business/ 1 [1, 2, 3, 4, 5, 8, 9, 10, 12, 15] The

More information

XMPによる並列化実装2

XMPによる並列化実装2 2 3 C Fortran Exercise 1 Exercise 2 Serial init.c init.f90 XMP xmp_init.c xmp_init.f90 Serial laplace.c laplace.f90 XMP xmp_laplace.c xmp_laplace.f90 #include int a[10]; program init integer

More information