2012年度HPCサマーセミナー_多田野.pptx

Size: px
Start display at page:

Download "2012年度HPCサマーセミナー_多田野.pptx"

Transcription

1 ! CCS HPC! I " [email protected]" " 1

2 " " " " " " " 2

3 3

4 " " Ax = b" " " 4

5 Ax = b" A = a 11 a a 1n a 21 a a 2n a n1 a n2... a nn, x = x 1 x 2. x n, b = b 1 b 2. b n " " 5

6 Gauss LU 1)! A " " " 2) " Krylov 1)! A " " 2) " 6

7 Krylov Ax = b x k = x 0 + z k, z k K k (A; r 0 ) Krylov K k (A; r 0 ) = span(r 0, Ar 0,..., A k 1 r 0 ) r k = b Ax k = r 0 Az k K k+1 (A; r 0 ) x 0 " x 1 " x 2 "! x k 1 " x k " x" Krylov 7

8 Hermite 1. Hermite A = A H Conjugate Gradient method: CG " Conjugate Residual method: CR " Mininal Residual Method: MINRES Hermite " " Hermite A = A H = Ā T (a i j = ā ji ) 8

9 Hermite x 0 is an initial guess, Compute r 0 = b Ax 0, Set p 0 = r 0, For k = 0, 1,..., until r k 2 ε TOL b 2 do : q k = Ap k, α k = (r k, r k ) (p k, q k ), x k+1 = x k + α k p k, r k+1 = r k α k q k, β k = (r k+1, r k+1 ), (r k, r k ) p k+1 = r k+1 + β k p k, End For CG 9

10 Hermite 2. Hermite A A H Bi-Conjugate Gradient: BiCG " Conjugate Residual Squared: CGS " BiCG Stabilization: BiCGSTAB Generalized Conjugate Residual: GCR " Generalized Minimal Residual: GMRES 10

11 Hermite x 0 is an initial guess, Compute r 0 = b Ax 0, Choose r 0 such that (r 0, r 0) 0, Set p 0 = r 0 and p 0 = r 0, For k = 0, 1,..., until r k 2 ε TOL b 2 do : q k = Ap k, q k = AH p k, α k = (r k, r k ) (p k, q k ), x k+1 = x k + α k p k, r k+1 = r k α k q k, r k+1 = r k ᾱ kq k, β k = (r k+1, r k+1 ), (r k, r k ) p k+1 = r k+1 + β k p k, p k+1 = r k+1 + β k p k, End For BiCG 11

12 Hermite x 0 is an initial guess, Compute r 0 = b Ax 0, Set p 0 = r 0 and q 0 = s 0 = Ar 0, For k = 0, 1,..., until r k 2 ε TOL b 2 do : α k = (q k, r k ) (q k, q k ), x k+1 = x k + α k p k, r k+1 = r k α k q k, s k+1 = Ar k+1, β k,i = (q i, s k+1 ), (i = 0, 1,..., k) p k+1 = r k+1 + q k+1 = s k+1 + End For (q i, q i ) k i=0 β k,i p i, k β k,i q i, i=0 1 1 " " GCR 12

13 rk 2/ b 2 Relative residual norm,! Iteration number, k! " BiCG " CGS " BiCGSTAB " GCR 13

14 3. A = A T A H " Conjugate Orthogonal Conjugate Gradient: COCG " 1 1 " A = A T A H (a i j = a ji ā ji ) 14

15 x 0 is an initial guess, Compute r 0 = b Ax 0, Set p 0 = r 0, For k = 0, 1,..., until r k 2 ε TOL b 2 do : q k = Ap k, α k = ( r k, r k ) ( p k, q k ), x k+1 = x k + α k p k, r k+1 = r k α k q k, β k = ( r k+1, r k+1 ), ( r k, r k ) p k+1 = r k+1 + β k p k, End For COCG 15

16 2 Poisson 2 u x + 2 u 2 y = f, 2 in Ω u = ū, on Ω f, ū! x, y (M+1) " 5 y" 1" O" 1" x" M#M M 4 " 5M 2 4M 16

17 A = a 11 0 a 13 0 a 15 0 a 22 0 a 24 a 25 a 31 a 32 a a 43 a a 52 0 a 54 a 55 "val!:" " "col_ind!:" "row_ptr":" " """ val: a 11 a 13 a 15 a 22 a 24 a 25 a 31 a 32 a 33 a 43 a 44 a 52 a 54 a 55 col_ind: row_ptr:

18 A = a 11 0 a 13 0 a 15 0 a 22 0 a 24 a 25 a 31 a 32 a a 43 a a 52 0 a 54 a 55 "val!:" " "row_ind!:" "col_ptr":" " """ val: a 11 a 31 a 22 a 32 a 52 a 13 a 33 a 43 a 24 a 44 a 54 a 15 a 25 a 55 row_ind: col_ptr:

19 CRS A x y = Ax" y 1 a 11 a a 1n x 1 y 2 a 21 a a 2n x 2 = y n a n1 a n2... a nn x n Fortran Code! do i=1,n! y(i) = 0.0D0! do j=row_ptr(i), row_ptr(i+1)-1! y(i) = y(i)+val(j)*x(col_ind(j))! end do 19

20 CCS A x y = Ax" x 1 x 2 y = [a 1, a 2,..., a n ] Fortran Code! do i=1,n! y(i) = 0.0D0! do j=1,n! do i=col_ptr(j),col_ptr(j+1)-1! y(row_ind(i)) = y(row_ind(i))+val(i)*x(j)!. x n = n i=1 a i x i 20

21 CRS y = Ax Proc. 0! Proc. 1! Proc. 2! *" =" Proc. 3! A" x" " " y" MPI_Gather " Proc. 0 21

22 CCS y = Ax Proc. 0! Proc. 1! Proc. 2! Proc. 3! *" =" +" +" +" A" " x" " y" MPI_Reduce Proc. 0 " 22

23 x" (x, y) = n j=1 x j y j Proc. 0! Proc. 1! Proc. 2! Proc. 3! y" =" tmp_sum! =" =" =" tmp_sum! tmp_sum! tmp_sum! sum! MPI_Reduce Proc.0 23

24 MPI program main! include 'mpif.h'!...! call mpi_init(ierr)! call mpi_comm_size(mpi_comm_world, nprocs, ierr)! call mpi_comm_rank(mpi_comm_world, myrank, ierr)!...! tmp_sum = (0.0D0, 0.0D0)! do i=istart(myrank+1), iend(myrank+1)! tmp_sum = tmp_sum + conj(x(i)) * y(i)!! call mpi_reduce(tmp_sum, sum, 1, mpi_double_complex,! mpi_sum, 0, mpi_comm_world, ierr)!...! call mpi_finalize(ierr)! 24

25 y = y + ax x, y a a MPI_Bcast Proc. 0! Proc. 1! Proc. 2! Proc. 3! a" a" a" a" a" x" y" +" +" +" +" 25

26 Krylov Krylov " " " " Ax = b" 26

27 A A K 1 K 2 K 1 1 AK 1 2 I Ax = b (K 1 1 AK 1 2 )(K 2x) = K 1 1 b A 1 A M 1 AM I, MA I Ax = b MAx = Mb Ax = b (AM)(M 1 x) = b 27

28 A 1 M F(M) = I AM 2 F M F(M) = I AM 2 F = n j=1 e j Am j 2 2 M " M M = A 1 n " A F = n i=1 n a i j 2 j=1 28

29 A 1 A Neumann I A < 1 A 1 = [I (I A)] 1 = j=0 (I A) j m M A 1 M = M " m j=0 (I A) j 29

30 Relative residual norm,! rk 2/ b Iteration number, k " " BiCG " BiCG 30

31 !

32 L AX = B A : n " X = x (1), x (2),..., x (L), B = b (1), b (2),..., b (L) " A = LU " L OK" " L 32

33 Block Krylov Block Krylov Block BiCG "O Leary (1980)! Block GMRES "Vital (1990)! Block QMR "Freund (1997)! Block BiCGSTAB "Guennouni (2003)! Block BiCGGR!Tadano (2009)! " 33

34 Block Krylov Relative residual norm Iteration number Block Krylov " L = 1 L = 2 L = 4 " 34

35 Block BiCGSTAB X 0 C n L is an initial guess, Compute R 0 = B AX 0, Set P 0 = R 0, Choose R 0 C n L, For k = 0, 1,..., until R k F ε B F do: V k = AP k, Solve ( R H 0 V k)α k = R H 0 R k for α k, T k = R k V k α k, Z k = AT k, ζ k = Tr Z H k T k /Tr Z H k Z k, X k+1 = X k + P k α k + ζ k T k, R k+1 = T k ζ k Z k, Solve ( R H 0 V k)β k = R H 0 Z k for β k, P k+1 = R k+1 + (P k ζ k V k )β k, End BiCGSTAB " " 1.! " "1 L " " 2.!! k, " k L " " 3.! " " " " 4. " # k " "Tr[ ] " " 35

36 CRS " Y = AX Y, X n L do k=1,l! do i=1,n! do j=row_ptr(i), row_ptr(i+1)-1! Y(i,k)=Y(i,k)+A(j)*X(col_ind(j),k)! end do [ ]" X " Fortran " L 36

37 [ ]" X, Y do i=1,n! do j=row_ptr(i), row_ptr(i+1)-1! do k=1,l! Y(k,i)=Y(k,i)+A(j)*X(k,col_ind(j))! end do X L " 1 " Y " 37

38 n#l L#L " n#l L#L T k = R k V k α k T T k = RT k αt k VT k do j=1,n! do i=1,l! T(i,j)=R(i,j)! do j=1,n! do i=1,l! do k=1,l! T(k,j)=T(k,j) Alpha(k,i)*V(i,j)! Alpha! k " " 38

39 L#n n#l! k, " k " " C k = R H 0 V k do j=1,n! do i=1,l! do k=1,l! C(k,i)=C(k,i)+R0(k,j)*V(i,j)! end do R0 " C k " 39

40 OpenMP " "!$OMP PARALLEL! [ ]"!$OMP END PARALLEL " " "!$OMP PARALLEL "!$OMP END PARALLEL " 40

41 OpenMP 1.!$OMP DO! do i=1,n! do j=row_ptr(i), row_ptr(i+1)-1! do k=1,l! Y(k,i)=Y(k,i)+A(j)*X(k,col_ind(j))! end do do!$omp DO 41

42 OpenMP 2. n#l L#L!$OMP DO! do j=1,n! do i=1,l! T(i,j)=R(i,j)!!$OMP DO! do j=1,n! do i=1,l! do k=1,l! T(k,j)=T(k,j) Alpha(k,i)*V(i,j)! 2 42

43 OpenMP 3. L#n n#l Reduction NTH = OMP_GET_NUM_THREADS()! MYID = OMP_GET_THREAD_NUM()+1!!$OMP SINGLE! allocate( TMP(L,L,NTH) )!!$OMP END SINGLE NTH: " MYID: " TMP: 43

44 OpenMP C k = R H 0 V k!$omp DO! do j=1,n! do i=1,l! do k=1,l! TMP(k,i,MYID) = TMP(k,i,MYID)+R0(k,j)*V(i,j)!!$OMP BARRIER!!$OMP SINGLE! do k=1,nth! do j=1,l! do i=1,l! C(i,j) = C(i,j) + TMP(i,j,k)!!$OMP END SINGLE " Reduction 44

45 GFLOPS " 1,572,864 80,216,064 " T2K-Tsukuba GFLOPS " CPU : AMD Opteron 2.3GHz # 4 OpenMP 16 " 45

46 OpenMP [ ]" 1, 572, 864" 80, 216, 064" 4" [ ]" CPU: Intel Xeon X # 2" Mem: 48GBytes" OS: Cent OS 5.3" Intel Fortran ver. 11.1" -fast -openmp / (179) (179) (179) (181) (181) (181) (181) (181)

47 " Krylov " " " Block Krylov OpenMP " 47

Krylov (b) x k+1 := x k + α k p k (c) r k+1 := r k α k Ap k ( := b Ax k+1 ) (d) β k := r k r k 2 2 (e) : r k 2 / r 0 2 < ε R (f) p k+1 :=

Krylov (b) x k+1 := x k + α k p k (c) r k+1 := r k α k Ap k ( := b Ax k+1 ) (d) β k := r k r k 2 2 (e) : r k 2 / r 0 2 < ε R (f) p k+1 := 127 10 Krylov Krylov (Conjugate-Gradient (CG ), Krylov ) MPIBNCpack 10.1 CG (Conjugate-Gradient CG ) A R n n a 11 a 12 a 1n a 21 a 22 a 2n A T = =... a n1 a n2 a nn n a 11 a 21 a n1 a 12 a 22 a n2 = A...

More information

XcalableMP入門

XcalableMP入門 XcalableMP 1 HPC-Phys@, 2018 8 22 XcalableMP XMP XMP Lattice QCD!2 XMP MPI MPI!3 XMP 1/2 PCXMP MPI Fortran CCoarray C++ MPIMPI XMP OpenMP http://xcalablemp.org!4 XMP 2/2 SPMD (Single Program Multiple Data)

More information

IDRstab(s, L) GBiCGSTAB(s, L) 2. AC-GBiCGSTAB(s, L) Ax = b (1) A R n n x R n b R n 2.1 IDR s L r k+1 r k+1 = b Ax k+1 IDR(s) r k+1 = (I ω k A)(r k dr

IDRstab(s, L) GBiCGSTAB(s, L) 2. AC-GBiCGSTAB(s, L) Ax = b (1) A R n n x R n b R n 2.1 IDR s L r k+1 r k+1 = b Ax k+1 IDR(s) r k+1 = (I ω k A)(r k dr 1 2 IDR(s) GBiCGSTAB(s, L) IDR(s) IDRstab(s, L) GBiCGSTAB(s, L) Verification of effectiveness of Auto-Correction technique applied to preconditioned iterative methods Keiichi Murakami 1 Seiji Fujino 2

More information

120802_MPI.ppt

120802_MPI.ppt CPU CPU CPU CPU CPU SMP Symmetric MultiProcessing CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CP OpenMP MPI MPI CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU MPI MPI+OpenMP CPU CPU CPU CPU CPU CPU CPU CP

More information

040312研究会HPC2500.ppt

040312研究会HPC2500.ppt 2004312 e-mail : [email protected] 1 2 PRIMEPOWER VX/VPP300 VPP700 GP7000 AP3000 VPP5000 PRIMEPOWER 2000 PRIMEPOWER HPC2500 1998 1999 2000 2001 2002 2003 3 VPP5000 PRIMEPOWER ( 1 VU 9.6 GF 16GB 1 VU

More information

¥Ñ¥Ã¥±¡¼¥¸ Rhpc ¤Î¾õ¶·

¥Ñ¥Ã¥±¡¼¥¸ Rhpc ¤Î¾õ¶· Rhpc COM-ONE 2015 R 27 12 5 1 / 29 1 2 Rhpc 3 forign MPI 4 Windows 5 2 / 29 1 2 Rhpc 3 forign MPI 4 Windows 5 3 / 29 Rhpc, R HPC Rhpc, ( ), snow..., Rhpc worker call Rhpc lapply 4 / 29 1 2 Rhpc 3 forign

More information

01_OpenMP_osx.indd

01_OpenMP_osx.indd OpenMP* / 1 1... 2 2... 3 3... 5 4... 7 5... 9 5.1... 9 5.2 OpenMP* API... 13 6... 17 7... 19 / 4 1 2 C/C++ OpenMP* 3 Fortran OpenMP* 4 PC 1 1 9.0 Linux* Windows* Xeon Itanium OS 1 2 2 WEB OS OS OS 1 OS

More information

untitled

untitled RIKEN AICS Summer School 3 4 MPI 2012 8 8 1 6 MPI MPI 2 allocatable 2 Fox mpi_sendrecv 3 3 FFT mpi_alltoall MPI_PROC_NULL 4 FX10 /home/guest/guest07/school/ 5 1 A (i, j) i+j x i i y = Ax A x y y 1 y i

More information

untitled

untitled I 9 MPI (II) 2012 6 14 .. MPI. 1-3 sum100.f90 4 istart=myrank*25+1 iend=(myrank+1)*25 0 1 2 3 mpi_recv 3 isum1 1 isum /tmp/120614/sum100_4.f90 program sum100_4 use mpi implicit none integer :: i,istart,iend,isum,isum1,ip

More information

MPI usage

MPI usage MPI (Version 0.99 2006 11 8 ) 1 1 MPI ( Message Passing Interface ) 1 1.1 MPI................................. 1 1.2............................... 2 1.2.1 MPI GATHER.......................... 2 1.2.2

More information

<4D F736F F F696E74202D D F95C097F D834F E F93FC96E5284D F96E291E85F8DE391E52E >

<4D F736F F F696E74202D D F95C097F D834F E F93FC96E5284D F96E291E85F8DE391E52E > SX-ACE 並列プログラミング入門 (MPI) ( 演習補足資料 ) 大阪大学サイバーメディアセンター日本電気株式会社 演習問題の構成 ディレクトリ構成 MPI/ -- practice_1 演習問題 1 -- practice_2 演習問題 2 -- practice_3 演習問題 3 -- practice_4 演習問題 4 -- practice_5 演習問題 5 -- practice_6

More information

OpenMP¤òÍѤ¤¤¿ÊÂÎó·×»»¡Ê£²¡Ë

OpenMP¤òÍѤ¤¤¿ÊÂÎó·×»»¡Ê£²¡Ë 2013 5 30 (schedule) (omp sections) (omp single, omp master) (barrier, critical, atomic) program pi i m p l i c i t none integer, parameter : : SP = kind ( 1. 0 ) integer, parameter : : DP = selected real

More information

OpenMP (1) 1, 12 1 UNIX (FUJITSU GP7000F model 900), 13 1 (COMPAQ GS320) FUJITSU VPP5000/64 1 (a) (b) 1: ( 1(a))

OpenMP (1) 1, 12 1 UNIX (FUJITSU GP7000F model 900), 13 1 (COMPAQ GS320) FUJITSU VPP5000/64 1 (a) (b) 1: ( 1(a)) OpenMP (1) 1, 12 1 UNIX (FUJITSU GP7000F model 900), 13 1 (COMPAQ GS320) FUJITSU VPP5000/64 1 (a) (b) 1: ( 1(a)) E-mail: {nanri,amano}@cc.kyushu-u.ac.jp 1 ( ) 1. VPP Fortran[6] HPF[3] VPP Fortran 2. MPI[5]

More information

13 0 1 1 4 11 4 12 5 13 6 2 10 21 10 22 14 3 20 31 20 32 25 33 28 4 31 41 32 42 34 43 38 5 41 51 41 52 43 53 54 6 57 61 57 62 60 70 0 Gauss a, b, c x, y f(x, y) = ax 2 + bxy + cy 2 = x y a b/2 b/2 c x

More information

OpenMP¤òÍѤ¤¤¿ÊÂÎó·×»»¡Ê£±¡Ë

OpenMP¤òÍѤ¤¤¿ÊÂÎó·×»»¡Ê£±¡Ë 2012 5 24 scalar Open MP Hello World Do (omp do) (omp workshare) (shared, private) π (reduction) PU PU PU 2 16 OpenMP FORTRAN/C/C++ MPI OpenMP 1997 FORTRAN Ver. 1.0 API 1998 C/C++ Ver. 1.0 API 2000 FORTRAN

More information

I I / 47

I I / 47 1 2013.07.18 1 I 2013 3 I 2013.07.18 1 / 47 A Flat MPI B 1 2 C: 2 I 2013.07.18 2 / 47 I 2013.07.18 3 / 47 #PJM -L "rscgrp=small" π-computer small: 12 large: 84 school: 24 84 16 = 1344 small school small

More information

OpenMP¤òÍѤ¤¤¿ÊÂÎó·×»»¡Ê£±¡Ë

OpenMP¤òÍѤ¤¤¿ÊÂÎó·×»»¡Ê£±¡Ë 2011 5 26 scalar Open MP Hello World Do (omp do) (omp workshare) (shared, private) π (reduction) scalar magny-cours, 48 scalar scalar 1 % scp. ssh / authorized keys 133. 30. 112. 246 2 48 % ssh 133.30.112.246

More information

弾性定数の対称性について

弾性定数の対称性について () by T. oyama () ij C ij = () () C, C, C () ij ji ij ijlk ij ij () C C C C C C * C C C C C * * C C C C = * * * C C C * * * * C C * * * * * C () * P (,, ) P (,, ) lij = () P (,, ) P(,, ) (,, ) P (, 00,

More information

橡魅力ある数学教材を考えよう.PDF

橡魅力ある数学教材を考えよう.PDF Web 0 2 2_1 x y f x y f f 2_2 2 1 2_3 m n AB A'B' x m n 2 1 ( ) 2_4 1883 5 6 2 2_5 2 9 10 2 1 1 1 3 3_1 2 2 2 16 2 1 0 1 2 2 4 =16 0 31 32 1 2 0 31 3_2 2 3_3 3_4 1 1 GO 3 3_5 2 5 9 A 2 6 10 B 3 7 11 C

More information

2005 2006.2.22-1 - 1 Fig. 1 2005 2006.2.22-2 - Element-Free Galerkin Method (EFGM) Meshless Local Petrov-Galerkin Method (MLPGM) 2005 2006.2.22-3 - 2 MLS u h (x) 1 p T (x) = [1, x, y]. (1) φ(x) 0.5 φ(x)

More information

untitled

untitled A = QΛQ T A n n Λ Q A = XΛX 1 A n n Λ X GPGPU A 3 T Q T AQ = T (Q: ) T u i = λ i u i T {λ i } {u i } QR MR 3 v i = Q u i A {v i } A n = 9000 Quad Core Xeon 2 LAPACK (4/3) n 3 O(n 2 ) O(n 3 ) A {v i }

More information

GeoFEM開発の経験から

GeoFEM開発の経験から FrontISTR における並列計算のしくみ < 領域分割に基づく並列 FEM> メッシュ分割 領域分割 領域分割 ( パーティショニングツール ) 全体制御 解析制御 メッシュ hecmw_ctrl.dat 境界条件 材料物性 計算制御パラメータ 可視化パラメータ 領域分割ツール 逐次計算 並列計算 Front ISTR FEM の主な演算 FrontISTR における並列計算のしくみ < 領域分割に基づく並列

More information

演習準備

演習準備 演習準備 2014 年 3 月 5 日神戸大学大学院システム情報学研究科森下浩二 1 演習準備の内容 神戸大 FX10(π-Computer) 利用準備 システム概要 ログイン方法 コンパイルとジョブ実行方法 MPI 復習 1. MPIプログラムの基本構成 2. 並列実行 3. 1 対 1 通信 集団通信 4. データ 処理分割 5. 計算時間計測 2 神戸大 FX10(π-Computer) 利用準備

More information

+ 1 ( ) I IA i i i 1 n m a 11 a 1j a 1m A = a i1 a ij a im a n1 a nj a nm.....

+   1 ( ) I IA i i i 1 n m a 11 a 1j a 1m A = a i1 a ij a im a n1 a nj a nm..... + http://krishnathphysaitama-uacjp/joe/matrix/matrixpdf 1 ( ) I IA i i i 1 n m a 11 a 1j a 1m A = a i1 a ij a im a n1 a nj a nm (1) n m () (n, m) ( ) n m B = ( ) 3 2 4 1 (2) 2 2 ( ) (2, 2) ( ) C = ( 46

More information

1 12 *1 *2 (1991) (1992) (2002) (1991) (1992) (2002) 13 (1991) (1992) (2002) *1 (2003) *2 (1997) 1

1 12 *1 *2 (1991) (1992) (2002) (1991) (1992) (2002) 13 (1991) (1992) (2002) *1 (2003) *2 (1997) 1 2005 1 1991 1996 5 i 1 12 *1 *2 (1991) (1992) (2002) (1991) (1992) (2002) 13 (1991) (1992) (2002) *1 (2003) *2 (1997) 1 2 13 *3 *4 200 1 14 2 250m :64.3km 457mm :76.4km 200 1 548mm 16 9 12 589 13 8 50m

More information

2 T 1 N n T n α = T 1 nt n (1) α = 1 100% OpenMP MPI OpenMP OpenMP MPI (Message Passing Interface) MPI MPICH OpenMPI 1 OpenMP MPI MPI (trivial p

2 T 1 N n T n α = T 1 nt n (1) α = 1 100% OpenMP MPI OpenMP OpenMP MPI (Message Passing Interface) MPI MPICH OpenMPI 1 OpenMP MPI MPI (trivial p 22 6 22 MPI MPI 1 1 2 2 3 MPI 3 4 7 4.1.................................. 7 4.2 ( )................................ 10 4.3 (Allreduce )................................. 12 5 14 5.1........................................

More information

1 911 9001030 9:00 A B C D E F G H I J K L M 1A0900 1B0900 1C0900 1D0900 1E0900 1F0900 1G0900 1H0900 1I0900 1J0900 1K0900 1L0900 1M0900 9:15 1A0915 1B0915 1C0915 1D0915 1E0915 1F0915 1G0915 1H0915 1I0915

More information

Microsoft PowerPoint - 演習1:並列化と評価.pptx

Microsoft PowerPoint - 演習1:並列化と評価.pptx 講義 2& 演習 1 プログラム並列化と性能評価 神戸大学大学院システム情報学研究科横川三津夫 [email protected] 2014/3/5 RIKEN AICS HPC Spring School 2014: プログラム並列化と性能評価 1 2014/3/5 RIKEN AICS HPC Spring School 2014: プログラム並列化と性能評価 2 2 次元温度分布の計算

More information

C/C++ FORTRAN FORTRAN MPI MPI MPI UNIX Windows (SIMD Single Instruction Multipule Data) SMP(Symmetric Multi Processor) MPI (thread) OpenMP[5]

C/C++ FORTRAN FORTRAN MPI MPI MPI UNIX Windows (SIMD Single Instruction Multipule Data) SMP(Symmetric Multi Processor) MPI (thread) OpenMP[5] MPI ( ) [email protected] 1 ( ) MPI MPI Message Passing Interface[2] MPI MPICH[3],LAM/MPI[4] (MIMDMultiple Instruction Multipule Data) Message Passing ( ) (MPI (rank) PE(Processing Element)

More information

n (1.6) i j=1 1 n a ij x j = b i (1.7) (1.7) (1.4) (1.5) (1.4) (1.7) u, v, w ε x, ε y, ε x, γ yz, γ zx, γ xy (1.8) ε x = u x ε y = v y ε z = w z γ yz

n (1.6) i j=1 1 n a ij x j = b i (1.7) (1.7) (1.4) (1.5) (1.4) (1.7) u, v, w ε x, ε y, ε x, γ yz, γ zx, γ xy (1.8) ε x = u x ε y = v y ε z = w z γ yz 1 2 (a 1, a 2, a n ) (b 1, b 2, b n ) A (1.1) A = a 1 b 1 + a 2 b 2 + + a n b n (1.1) n A = a i b i (1.2) i=1 n i 1 n i=1 a i b i n i=1 A = a i b i (1.3) (1.3) (1.3) (1.1) (ummation convention) a 11 x

More information

スライド 1

スライド 1 High Performance and Productivity 並列プログラミング課題と挑戦 HPC システムの利用の拡大の背景 シュミレーションへの要求 より複雑な問題をより精度良くシュミレーションすることが求められている HPC システムでの並列処理の要求の拡大 1. モデル アルゴリズム 解析対象は何れもより複雑で 規模の大きなものになっている 2. マイクロプロセッサのマルチコア化 3.

More information

インテル(R) Visual Fortran Composer XE 2013 Windows版 入門ガイド

インテル(R) Visual Fortran Composer XE 2013 Windows版 入門ガイド Visual Fortran Composer XE 2013 Windows* エクセルソフト株式会社 www.xlsoft.com Rev. 1.1 (2012/12/10) Copyright 1998-2013 XLsoft Corporation. All Rights Reserved. 1 / 53 ... 3... 4... 4... 5 Visual Studio... 9...

More information

. a, b, c, d b a ± d bc ± ad = c ac b a d c = bd ac b a d c = bc ad n m nm [2][3] BASIC [4] B BASIC [5] BASIC Intel x * IEEE a e d

. a, b, c, d b a ± d bc ± ad = c ac b a d c = bd ac b a d c = bc ad n m nm [2][3] BASIC [4] B BASIC [5] BASIC Intel x * IEEE a e d 3 3 BASIC C++ 8 Tflop/s 8TB [] High precision symmetric eigenvalue computation through exact tridiagonalization by using rational number arithmetic Hikaru Samukawa Abstract: Since rational number arithmetic,

More information

Microsoft PowerPoint _MPI-03.pptx

Microsoft PowerPoint _MPI-03.pptx 計算科学演習 Ⅰ ( 第 11 回 ) MPI を いた並列計算 (III) 神戸大学大学院システム情報学研究科横川三津夫 [email protected] 2014/07/03 計算科学演習 Ⅰ:MPI を用いた並列計算 (III) 1 2014/07/03 計算科学演習 Ⅰ:MPI を用いた並列計算 (III) 2 今週の講義の概要 1. 前回課題の解説 2. 部分配列とローカルインデックス

More information

コードのチューニング

コードのチューニング OpenMP による並列化実装 八木学 ( 理化学研究所計算科学研究センター ) KOBE HPC Spring School 2019 2019 年 3 月 14 日 スレッド並列とプロセス並列 スレッド並列 OpenMP 自動並列化 プロセス並列 MPI プロセス プロセス プロセス スレッドスレッドスレッドスレッド メモリ メモリ プロセス間通信 Private Private Private

More information

Sae x Sae x 1: 1. {x (i) 0 0 }N i=1 (x (i) 0 0 p(x 0) ) 2. = 1,, T a d (a) i (i = 1,, N) I, II I. v (i) II. x (i) 1 = f (x (i) 1 1, v(i) (b) i (i = 1,

Sae x Sae x 1: 1. {x (i) 0 0 }N i=1 (x (i) 0 0 p(x 0) ) 2. = 1,, T a d (a) i (i = 1,, N) I, II I. v (i) II. x (i) 1 = f (x (i) 1 1, v(i) (b) i (i = 1, ( ) 1 : ( ) Sampling/Imporance resampling (SIR) Kiagawa (1993, 1996), Gordon(1993) EnKF EnKF EnKF 1CPU 1core 2 x = f (x 1, v ) y = h (x, w ) (1a) (1b) PF p(x y 1 ) {x (i) 1 }N i=1, p(x y ) {x (i) }N i=1

More information

linearal1.dvi

linearal1.dvi 19 4 30 I 1 1 11 1 12 2 13 3 131 3 132 4 133 5 134 6 14 7 2 9 21 9 211 9 212 10 213 13 214 14 22 15 221 15 222 16 223 17 224 20 3 21 31 21 32 21 33 22 34 23 341 23 342 24 343 27 344 29 35 31 351 31 352

More information

Kroneher Levi-Civita 1 i = j δ i j = i j 1 if i jk is an even permutation of 1,2,3. ε i jk = 1 if i jk is an odd permutation of 1,2,3. otherwise. 3 4

Kroneher Levi-Civita 1 i = j δ i j = i j 1 if i jk is an even permutation of 1,2,3. ε i jk = 1 if i jk is an odd permutation of 1,2,3. otherwise. 3 4 [2642 ] Yuji Chinone 1 1-1 ρ t + j = 1 1-1 V S ds ds Eq.1 ρ t + j dv = ρ t dv = t V V V ρdv = Q t Q V jdv = j ds V ds V I Q t + j ds = ; S S [ Q t ] + I = Eq.1 2 2 Kroneher Levi-Civita 1 i = j δ i j =

More information

漸化式のすべてのパターンを解説しましたー高校数学の達人・河見賢司のサイト

漸化式のすべてのパターンを解説しましたー高校数学の達人・河見賢司のサイト https://www.hmg-gen.com/tuusin.html https://www.hmg-gen.com/tuusin1.html 1 2 OK 3 4 {a n } (1) a 1 = 1, a n+1 a n = 2 (2) a 1 = 3, a n+1 a n = 2n a n a n+1 a n = ( ) a n+1 a n = ( ) a n+1 a n {a n } 1,

More information

情報処理学会研究報告 IPSJ SIG Technical Report Vol.2014-HPC-144 No /5/ CRS 2 CRS Performance evaluation of exclusive version of preconditioned ite

情報処理学会研究報告 IPSJ SIG Technical Report Vol.2014-HPC-144 No /5/ CRS 2 CRS Performance evaluation of exclusive version of preconditioned ite 1 2 3 CRS 2 CRS Performance evaluation of exclusive version of preconditioned iterative method for dense matrix Abstract: As well known, only nonzero entries of a sparse matrix are stored in memory in

More information

Microsoft Word - 計算科学演習第1回3.doc

Microsoft Word - 計算科学演習第1回3.doc スーパーコンピュータの基本的操作方法 2009 年 9 月 10 日高橋康人 1. スーパーコンピュータへのログイン方法 本演習では,X 端末ソフト Exceed on Demand を使用するが, 必要に応じて SSH クライアント putty,ftp クライアント WinSCP や FileZilla を使用して構わない Exceed on Demand を起動し, 以下のとおり設定 ( 各自のユーザ

More information

統計数理研究所とスーパーコンピュータ

統計数理研究所とスーパーコンピュータ スーパーコンピュータと統計数理研究所 統計数理研究所 統計科学技術センターセンター長 中野純司 目次 スーパーコンピュータとは いったい何? 本当に スーパー?: ノートパソコンとの比較 どのように使う?: 仕組みとソフトウェア 統計数理研究所の ( スーパー ) コンピュータ 必要性 導入の歴史 現在の統数研スパコン : A, I, C 2/44 目次 スーパーコンピュータとは いったい何? 本当に

More information

±é½¬£²¡§£Í£Ð£É½éÊâ

±é½¬£²¡§£Í£Ð£É½éÊâ 2012 8 7 1 / 52 MPI Hello World I ( ) Hello World II ( ) I ( ) II ( ) ( sendrecv) π ( ) MPI fortran C wget http://www.na.scitec.kobe-u.ac.jp/ yaguchi/riken2012/enshu2.zip unzip enshu2.zip 2 / 52 FORTRAN

More information

スパコンに通じる並列プログラミングの基礎

スパコンに通じる並列プログラミングの基礎 2018.09.10 [email protected] ( ) 2018.09.10 1 / 59 [email protected] ( ) 2018.09.10 2 / 59 Windows, Mac Unix 0444-J [email protected] ( ) 2018.09.10 3 / 59 Part I Unix GUI CUI:

More information

ÊÂÎó·×»»¤È¤Ï/OpenMP¤Î½éÊâ¡Ê£±¡Ë

ÊÂÎó·×»»¤È¤Ï/OpenMP¤Î½éÊâ¡Ê£±¡Ë 2015 5 21 OpenMP Hello World Do (omp do) Fortran (omp workshare) CPU Richardson s Forecast Factory 64,000 L.F. Richardson, Weather Prediction by Numerical Process, Cambridge, University Press (1922) Drawing

More information

Microsoft PowerPoint - 演習2:MPI初歩.pptx

Microsoft PowerPoint - 演習2:MPI初歩.pptx 演習 2:MPI 初歩 - 並列に計算する - 2013 年 8 月 6 日 神戸大学大学院システム情報学研究科計算科学専攻横川三津夫 MPI( メッセージ パッシング インターフェース ) を使おう! [ 演習 2 の内容 ] はじめの一歩課題 1: Hello, world を並列に出力する. 課題 2: プロセス 0 からのメッセージを受け取る (1 対 1 通信 ). 部分に分けて計算しよう課題

More information

スパコンに通じる並列プログラミングの基礎

スパコンに通じる並列プログラミングの基礎 2018.06.04 2018.06.04 1 / 62 2018.06.04 2 / 62 Windows, Mac Unix 0444-J 2018.06.04 3 / 62 Part I Unix GUI CUI: Unix, Windows, Mac OS Part II 2018.06.04 4 / 62 0444-J ( : ) 6 4 ( ) 6 5 * 6 19 SX-ACE * 6

More information

スパコンに通じる並列プログラミングの基礎

スパコンに通じる並列プログラミングの基礎 2016.06.06 2016.06.06 1 / 60 2016.06.06 2 / 60 Windows, Mac Unix 0444-J 2016.06.06 3 / 60 Part I Unix GUI CUI: Unix, Windows, Mac OS Part II 0444-J 2016.06.06 4 / 60 ( : ) 6 6 ( ) 6 10 6 16 SX-ACE 6 17

More information

Microsoft PowerPoint - S1-ref-F.ppt [互換モード]

Microsoft PowerPoint - S1-ref-F.ppt [互換モード] 課題 S1 解説 Fortran 言語編 RIKEN AICS HPC Summer School 2014 中島研吾 ( 東大 情報基盤センター ) 横川三津夫 ( 神戸大 計算科学教育センター ) MPI Programming 課題 S1 (1/2) /a1.0~a1.3, /a2.0~a2.3 から局所ベクトル情報を読み込み, 全体ベクトルのノルム ( x ) を求めるプログラムを作成する

More information

main.dvi

main.dvi PC 1 1 [1][2] [3][4] ( ) GPU(Graphics Processing Unit) GPU PC GPU PC ( 2 GPU ) GPU Harris Corner Detector[5] CPU ( ) ( ) CPU GPU 2 3 GPU 4 5 6 7 1 [email protected] 45 2 ( ) CPU ( ) ( ) () 2.1

More information

09中西

09中西 PC NEC Linux (1) (2) (1) (2) 1 Linux Linux 2002.11.22) LLNL Linux Intel Xeon 2300 ASCIWhite1/7 / HPC (IDC) 2002 800 2005 2004 HPC 80%Linux) Linux ASCI Purple (ASCI 100TFlops Blue Gene/L 1PFlops (2005)

More information

1 (bit ) ( ) PC WS CPU IEEE754 standard ( 24bit) ( 53bit)

1 (bit ) ( ) PC WS CPU IEEE754 standard ( 24bit) ( 53bit) GNU MP BNCpack [email protected] 2002 9 20 ( ) Linux Conference 2002 1 1 (bit ) ( ) PC WS CPU IEEE754 standard ( 24bit) ( 53bit) 10 2 2 3 4 5768:9:; = %? @BADCEGFH-I:JLKNMNOQP R )TSVU!" # %$ & " #

More information

Note.tex 2008/09/19( )

Note.tex 2008/09/19( ) 1 20 9 19 2 1 5 1.1........................ 5 1.2............................. 8 2 9 2.1............................. 9 2.2.............................. 10 3 13 3.1.............................. 13 3.2..................................

More information

T rank A max{rank Q[R Q, J] t-rank T [R T, C \ J] J C} 2 ([1, p.138, Theorem 4.2.5]) A = ( ) Q rank A = min{ρ(j) γ(j) J J C} C, (5) ρ(j) = rank Q[R Q,

T rank A max{rank Q[R Q, J] t-rank T [R T, C \ J] J C} 2 ([1, p.138, Theorem 4.2.5]) A = ( ) Q rank A = min{ρ(j) γ(j) J J C} C, (5) ρ(j) = rank Q[R Q, (ver. 4:. 2005-07-27) 1 1.1 (mixed matrix) (layered mixed matrix, LM-matrix) m n A = Q T (2m) (m n) ( ) ( ) Q I m Q à = = (1) T diag [t 1,, t m ] T rank à = m rank A (2) 1.2 [ ] B rank [B C] rank B rank

More information

情報処理概論(第二日目)

情報処理概論(第二日目) 1 並列プログラミング超入門講習会 九州大学情報基盤研究開発センター MPI コース 2 並列計算機の構成 計算ノード ネットワーク CPU コア メモリ アクセラレータ (GPU 等 ) 例 : スーパーコンピュータシステム ITO サブシステム B ノード数 CPU 数 / ノードコア数 / CPU GPU 数 / ノード 128 2 18 4 MPI (Message Passing Interface)

More information

高校生の就職への数学II

高校生の就職への数学II II O Tped b L A TEX ε . II. 3. 4. 5. http://www.ocn.ne.jp/ oboetene/plan/ 7 9 i .......................................................................................... 3..3...............................

More information

2 2 MATHEMATICS.PDF 200-2-0 3 2 (p n ), ( ) 7 3 4 6 5 20 6 GL 2 (Z) SL 2 (Z) 27 7 29 8 SL 2 (Z) 35 9 2 40 0 2 46 48 2 2 5 3 2 2 58 4 2 6 5 2 65 6 2 67 7 2 69 2 , a 0 + a + a 2 +... b b 2 b 3 () + b n a

More information

JFE.dvi

JFE.dvi ,, Department of Civil Engineering, Chuo University Kasuga 1-13-27, Bunkyo-ku, Tokyo 112 8551, JAPAN E-mail : [email protected] E-mail : [email protected] SATO KOGYO CO., LTD. 12-20, Nihonbashi-Honcho

More information

A11 (1993,1994) 29 A12 (1994) 29 A13 Trefethen and Bau Numerical Linear Algebra (1997) 29 A14 (1999) 30 A15 (2003) 30 A16 (2004) 30 A17 (2007) 30 A18

A11 (1993,1994) 29 A12 (1994) 29 A13 Trefethen and Bau Numerical Linear Algebra (1997) 29 A14 (1999) 30 A15 (2003) 30 A16 (2004) 30 A17 (2007) 30 A18 2013 8 29y, 2016 10 29 1 2 2 Jordan 3 21 3 3 Jordan (1) 3 31 Jordan 4 32 Jordan 4 33 Jordan 6 34 Jordan 8 35 9 4 Jordan (2) 10 41 x 11 42 x 12 43 16 44 19 441 19 442 20 443 25 45 25 5 Jordan 26 A 26 A1

More information