2012年度HPCサマーセミナー_多田野.pptx

Similar documents
untitled

Krylov (b) x k+1 := x k + α k p k (c) r k+1 := r k α k Ap k ( := b Ax k+1 ) (d) β k := r k r k 2 2 (e) : r k 2 / r 0 2 < ε R (f) p k+1 :=

XcalableMP入門

IDRstab(s, L) GBiCGSTAB(s, L) 2. AC-GBiCGSTAB(s, L) Ax = b (1) A R n n x R n b R n 2.1 IDR s L r k+1 r k+1 = b Ax k+1 IDR(s) r k+1 = (I ω k A)(r k dr

120802_MPI.ppt

040312研究会HPC2500.ppt

2 /83

¥Ñ¥Ã¥±¡¼¥¸ Rhpc ¤Î¾õ¶·

01_OpenMP_osx.indd

numb.dvi

Krylov A04 October 8, 2010 T. Sakurai (Univ. Tsukuba) Krylov October 8, / 48

untitled

untitled

MPI usage

並列計算の数理とアルゴリズム サンプルページ この本の定価 判型などは, 以下の URL からご覧いただけます. このサンプルページの内容は, 初版 1 刷発行時のものです.

HPC (pay-as-you-go) HPC Web 2

<4D F736F F F696E74202D D F95C097F D834F E F93FC96E5284D F96E291E85F8DE391E52E >

OpenMP¤òÍѤ¤¤¿ÊÂÎó·×»»¡Ê£²¡Ë

CPU Levels in the memory hierarchy Level 1 Level 2... Increasing distance from the CPU in access time Level n Size of the memory at each level 1: 2.2

OpenMP (1) 1, 12 1 UNIX (FUJITSU GP7000F model 900), 13 1 (COMPAQ GS320) FUJITSU VPP5000/64 1 (a) (b) 1: ( 1(a))


OpenMP¤òÍѤ¤¤¿ÊÂÎó·×»»¡Ê£±¡Ë

I I / 47

OpenMP¤òÍѤ¤¤¿ÊÂÎó·×»»¡Ê£±¡Ë

日本内科学会雑誌第102巻第4号

弾性定数の対称性について

Microsoft PowerPoint - KHPCSS pptx

コードのチューニング

橡魅力ある数学教材を考えよう.PDF


講義の流れ 並列プログラムの概要 通常のプログラムと並列プログラムの違い 並列プログラム作成手段と並列計算機の構造 OpenMP による並列プログラム作成 処理を複数コアに分割して並列実行する方法 MPI による並列プログラム作成 ( 午後 ) プロセス間通信による並列処理 処理の分割 + データの

02_C-C++_osx.indd

( ) a C n ( R n ) R a R C n. a C n (or R n ) a 0 2. α C( R ) a C n αa = α a 3. a, b C n a + b a + b ( ) p 8..2 (p ) a = [a a n ] T C n p n a

untitled

. (.8.). t + t m ü(t + t) + c u(t + t) + k u(t + t) = f(t + t) () m ü f. () c u k u t + t u Taylor t 3 u(t + t) = u(t) + t! u(t) + ( t)! = u(t) + t u(

GeoFEM開発の経験から

演習準備

1 Ricci V, V i, W f : V W f f(v ) = Imf W ( ) f : V 1 V k W 1

XACCの概要

+ 1 ( ) I IA i i i 1 n m a 11 a 1j a 1m A = a i1 a ij a im a n1 a nj a nm.....

1 12 *1 *2 (1991) (1992) (2002) (1991) (1992) (2002) 13 (1991) (1992) (2002) *1 (2003) *2 (1997) 1

2 T 1 N n T n α = T 1 nt n (1) α = 1 100% OpenMP MPI OpenMP OpenMP MPI (Message Passing Interface) MPI MPICH OpenMPI 1 OpenMP MPI MPI (trivial p


untitled

Microsoft PowerPoint - 演習1:並列化と評価.pptx

橡固有値セミナー2_棚橋改.PDF

C/C++ FORTRAN FORTRAN MPI MPI MPI UNIX Windows (SIMD Single Instruction Multipule Data) SMP(Symmetric Multi Processor) MPI (thread) OpenMP[5]

n (1.6) i j=1 1 n a ij x j = b i (1.7) (1.7) (1.4) (1.5) (1.4) (1.7) u, v, w ε x, ε y, ε x, γ yz, γ zx, γ xy (1.8) ε x = u x ε y = v y ε z = w z γ yz

スライド 1

all.dvi

1重谷.PDF

インテル(R) Visual Fortran Composer XE 2013 Windows版 入門ガイド

. a, b, c, d b a ± d bc ± ad = c ac b a d c = bd ac b a d c = bc ad n m nm [2][3] BASIC [4] B BASIC [5] BASIC Intel x * IEEE a e d

Microsoft PowerPoint _MPI-03.pptx

コードのチューニング

Sae x Sae x 1: 1. {x (i) 0 0 }N i=1 (x (i) 0 0 p(x 0) ) 2. = 1,, T a d (a) i (i = 1,, N) I, II I. v (i) II. x (i) 1 = f (x (i) 1 1, v(i) (b) i (i = 1,

PowerPoint プレゼンテーション


プログラム

(Jacobi Gauss-Seidel SOR ) 1. (Theory of Iteration Method) Jacobi Gauss-Seidel SOR 2. Jacobi (Jacobi s Iteration Method) Jacobi 3. Gauss-Seide

untitled

linearal1.dvi

Kroneher Levi-Civita 1 i = j δ i j = i j 1 if i jk is an even permutation of 1,2,3. ε i jk = 1 if i jk is an odd permutation of 1,2,3. otherwise. 3 4

3 p(3) , 3 +, 2 + 2, 2 + +, p(4) , 6, 7, 8 p(5), p(6), p(7), p(8). p(5) 7, p(6), p(7) 5, p(8) , + +

漸化式のすべてのパターンを解説しましたー高校数学の達人・河見賢司のサイト


情報処理学会研究報告 IPSJ SIG Technical Report Vol.2014-HPC-144 No /5/ CRS 2 CRS Performance evaluation of exclusive version of preconditioned ite

演習準備 2014 年 3 月 5 日神戸大学大学院システム情報学研究科森下浩二 1 RIKEN AICS HPC Spring School /3/5

OHP.dvi

Microsoft Word - 計算科学演習第1回3.doc

Microsoft PowerPoint _MPI-01.pptx

統計数理研究所とスーパーコンピュータ

±é½¬£²¡§£Í£Ð£É½éÊâ

スパコンに通じる並列プログラミングの基礎

ÊÂÎó·×»»¤È¤Ï/OpenMP¤Î½éÊâ¡Ê£±¡Ë

untitled

Microsoft PowerPoint - 演習2:MPI初歩.pptx

スパコンに通じる並列プログラミングの基礎

1 4 1 ( ) ( ) ( ) ( ) () 1 4 2

k 0 given, k t 0. 1 β t U (Af (k t ) k t+1 ) ( 1)+β t+1 U (Af (k t+1 ) k t+2 ) Af (k t+1 ) = 0 (4) t=1,2,3,...,t-1 t=t terminal point k T +1 = 0 2 T k

スパコンに通じる並列プログラミングの基礎

Microsoft PowerPoint - S1-ref-F.ppt [互換モード]

自動残差修正機能付き GBiCGSTAB$(s,L)$法 (科学技術計算アルゴリズムの数理的基盤と展開)

main.dvi

BLOCK TYPE.indd

PowerPoint プレゼンテーション

09中西

1 (bit ) ( ) PC WS CPU IEEE754 standard ( 24bit) ( 53bit)

LCM,GCD LCM GCD..,.. 1 LCM GCD a b a b. a divides b. a b. a, b :, CD(a, b) = {d a, b }, CM(a, b) = {m a, b }... CM(a, b). q > 0, m 1, m 2 CM

Note.tex 2008/09/19( )

T rank A max{rank Q[R Q, J] t-rank T [R T, C \ J] J C} 2 ([1, p.138, Theorem 4.2.5]) A = ( ) Q rank A = min{ρ(j) γ(j) J J C} C, (5) ρ(j) = rank Q[R Q,

情報処理概論(第二日目)

高校生の就職への数学II


9 8 7 (x-1.0)*(x-1.0) *(x-1.0) (a) f(a) (b) f(a) Figure 1: f(a) a =1.0 (1) a 1.0 f(1.0)

JFE.dvi

O x y z O ( O ) O (O ) 3 x y z O O x v t = t = 0 ( 1 ) O t = 0 c t r = ct P (x, y, z) r 2 = x 2 + y 2 + z 2 (t, x, y, z) (ct) 2 x 2 y 2 z 2 = 0

46 4 E E E E E 0 0 E E = E E E = ) E =0 2) φ = 3) ρ =0 1) 0 2) E φ E = grad φ E =0 P P φ = E ds 0

マルチコアPCクラスタ環境におけるBDD法のハイブリッド並列実装

A11 (1993,1994) 29 A12 (1994) 29 A13 Trefethen and Bau Numerical Linear Algebra (1997) 29 A14 (1999) 30 A15 (2003) 30 A16 (2004) 30 A17 (2007) 30 A18

Transcription:

! CCS HPC! I " tadano@cs.tsukuba.ac.jp" " 1

" " " " " " " 2

3

" " Ax = b" " " 4

Ax = b" A = a 11 a 12... a 1n a 21 a 22... a 2n...... a n1 a n2... a nn, x = x 1 x 2. x n, b = b 1 b 2. b n " " 5

Gauss LU 1)! A " " " 2) " Krylov 1)! A " " 2) " 6

Krylov Ax = b x k = x 0 + z k, z k K k (A; r 0 ) Krylov K k (A; r 0 ) = span(r 0, Ar 0,..., A k 1 r 0 ) r k = b Ax k = r 0 Az k K k+1 (A; r 0 ) x 0 " x 1 " x 2 "! x k 1 " x k " x" Krylov 7

Hermite 1. Hermite A = A H Conjugate Gradient method: CG " Conjugate Residual method: CR " Mininal Residual Method: MINRES Hermite " " Hermite A = A H = Ā T (a i j = ā ji ) 8

Hermite x 0 is an initial guess, Compute r 0 = b Ax 0, Set p 0 = r 0, For k = 0, 1,..., until r k 2 ε TOL b 2 do : q k = Ap k, α k = (r k, r k ) (p k, q k ), x k+1 = x k + α k p k, r k+1 = r k α k q k, β k = (r k+1, r k+1 ), (r k, r k ) p k+1 = r k+1 + β k p k, End For CG 9

Hermite 2. Hermite A A H Bi-Conjugate Gradient: BiCG " Conjugate Residual Squared: CGS " BiCG Stabilization: BiCGSTAB Generalized Conjugate Residual: GCR " Generalized Minimal Residual: GMRES 10

Hermite x 0 is an initial guess, Compute r 0 = b Ax 0, Choose r 0 such that (r 0, r 0) 0, Set p 0 = r 0 and p 0 = r 0, For k = 0, 1,..., until r k 2 ε TOL b 2 do : q k = Ap k, q k = AH p k, α k = (r k, r k ) (p k, q k ), x k+1 = x k + α k p k, r k+1 = r k α k q k, r k+1 = r k ᾱ kq k, β k = (r k+1, r k+1 ), (r k, r k ) p k+1 = r k+1 + β k p k, p k+1 = r k+1 + β k p k, End For BiCG 11

Hermite x 0 is an initial guess, Compute r 0 = b Ax 0, Set p 0 = r 0 and q 0 = s 0 = Ar 0, For k = 0, 1,..., until r k 2 ε TOL b 2 do : α k = (q k, r k ) (q k, q k ), x k+1 = x k + α k p k, r k+1 = r k α k q k, s k+1 = Ar k+1, β k,i = (q i, s k+1 ), (i = 0, 1,..., k) p k+1 = r k+1 + q k+1 = s k+1 + End For (q i, q i ) k i=0 β k,i p i, k β k,i q i, i=0 1 1 " " GCR 12

rk 2/ b 2 Relative residual norm,! 10 0 10-2 10-4 10-6 10-8 10-10 10-12 0 50 100 150 200 250 300 Iteration number, k! " BiCG " CGS " BiCGSTAB " GCR 13

3. A = A T A H " Conjugate Orthogonal Conjugate Gradient: COCG " 1 1 " A = A T A H (a i j = a ji ā ji ) 14

x 0 is an initial guess, Compute r 0 = b Ax 0, Set p 0 = r 0, For k = 0, 1,..., until r k 2 ε TOL b 2 do : q k = Ap k, α k = ( r k, r k ) ( p k, q k ), x k+1 = x k + α k p k, r k+1 = r k α k q k, β k = ( r k+1, r k+1 ), ( r k, r k ) p k+1 = r k+1 + β k p k, End For COCG 15

2 Poisson 2 u x + 2 u 2 y = f, 2 in Ω u = ū, on Ω f, ū! x, y (M+1) " 5 y" 1" O" 1" x" M#M M 4 " 5M 2 4M 16

A = a 11 0 a 13 0 a 15 0 a 22 0 a 24 a 25 a 31 a 32 a 33 0 0 0 0 a 43 a 44 0 0 a 52 0 a 54 a 55 "val!:" " "col_ind!:" "row_ptr":" " """ val: a 11 a 13 a 15 a 22 a 24 a 25 a 31 a 32 a 33 a 43 a 44 a 52 a 54 a 55 col_ind: 1 3 5 2 4 5 1 2 3 3 4 2 4 5 row_ptr: 1 4 7 10 12 15 +1 17

A = a 11 0 a 13 0 a 15 0 a 22 0 a 24 a 25 a 31 a 32 a 33 0 0 0 0 a 43 a 44 0 0 a 52 0 a 54 a 55 "val!:" " "row_ind!:" "col_ptr":" " """ val: a 11 a 31 a 22 a 32 a 52 a 13 a 33 a 43 a 24 a 44 a 54 a 15 a 25 a 55 row_ind: 1 3 2 3 5 1 3 4 2 4 5 1 2 5 col_ptr: 1 3 6 9 12 15 +1 18

CRS A x y = Ax" y 1 a 11 a 12... a 1n x 1 y 2 a 21 a 22... a 2n x 2 =........ y n a n1 a n2... a nn x n Fortran Code! do i=1,n! y(i) = 0.0D0! do j=row_ptr(i), row_ptr(i+1)-1! y(i) = y(i)+val(j)*x(col_ind(j))! end do 19

CCS A x y = Ax" x 1 x 2 y = [a 1, a 2,..., a n ] Fortran Code! do i=1,n! y(i) = 0.0D0! do j=1,n! do i=col_ptr(j),col_ptr(j+1)-1! y(row_ind(i)) = y(row_ind(i))+val(i)*x(j)!. x n = n i=1 a i x i 20

CRS y = Ax Proc. 0! Proc. 1! Proc. 2! *" =" Proc. 3! A" x" " " y" MPI_Gather " Proc. 0 21

CCS y = Ax Proc. 0! Proc. 1! Proc. 2! Proc. 3! *" =" +" +" +" A" " x" " y" MPI_Reduce Proc. 0 " 22

x" (x, y) = n j=1 x j y j Proc. 0! Proc. 1! Proc. 2! Proc. 3! y" =" tmp_sum! =" =" =" tmp_sum! tmp_sum! tmp_sum! sum! MPI_Reduce Proc.0 23

MPI program main! include 'mpif.h'!...! call mpi_init(ierr)! call mpi_comm_size(mpi_comm_world, nprocs, ierr)! call mpi_comm_rank(mpi_comm_world, myrank, ierr)!...! tmp_sum = (0.0D0, 0.0D0)! do i=istart(myrank+1), iend(myrank+1)! tmp_sum = tmp_sum + conj(x(i)) * y(i)!! call mpi_reduce(tmp_sum, sum, 1, mpi_double_complex,! mpi_sum, 0, mpi_comm_world, ierr)!...! call mpi_finalize(ierr)! 24

y = y + ax x, y a a MPI_Bcast Proc. 0! Proc. 1! Proc. 2! Proc. 3! a" a" a" a" a" x" y" +" +" +" +" 25

Krylov Krylov " " " " Ax = b" 26

A A K 1 K 2 K 1 1 AK 1 2 I Ax = b (K 1 1 AK 1 2 )(K 2x) = K 1 1 b A 1 A M 1 AM I, MA I Ax = b MAx = Mb Ax = b (AM)(M 1 x) = b 27

A 1 M F(M) = I AM 2 F M F(M) = I AM 2 F = n j=1 e j Am j 2 2 M " M M = A 1 n " A F = n i=1 n a i j 2 j=1 28

A 1 A Neumann I A < 1 A 1 = [I (I A)] 1 = j=0 (I A) j m M A 1 M = M " m j=0 (I A) j 29

Relative residual norm,! rk 2/ b 2 10 2 10 0 10-2 10-4 10-6 10-8 10-10 10-12 0 100 200 300 400 Iteration number, k " " BiCG " BiCG 30

!

L AX = B A : n " X = x (1), x (2),..., x (L), B = b (1), b (2),..., b (L) " A = LU " L OK" " L 32

Block Krylov Block Krylov Block BiCG "O Leary (1980)! Block GMRES "Vital (1990)! Block QMR "Freund (1997)! Block BiCGSTAB "Guennouni (2003)! Block BiCGGR!Tadano (2009)! " 33

Block Krylov 1 10 1 Relative residual norm 10-2 10-5 10-8 10-11 10-14 0 500 1000 1500 2000 Iteration number Block Krylov " L = 1 L = 2 L = 4 " 34

Block BiCGSTAB X 0 C n L is an initial guess, Compute R 0 = B AX 0, Set P 0 = R 0, Choose R 0 C n L, For k = 0, 1,..., until R k F ε B F do: V k = AP k, Solve ( R H 0 V k)α k = R H 0 R k for α k, T k = R k V k α k, Z k = AT k, ζ k = Tr Z H k T k /Tr Z H k Z k, X k+1 = X k + P k α k + ζ k T k, R k+1 = T k ζ k Z k, Solve ( R H 0 V k)β k = R H 0 Z k for β k, P k+1 = R k+1 + (P k ζ k V k )β k, End BiCGSTAB " " 1.! " "1 L " " 2.!! k, " k L " " 3.! " " " " 4. " # k " "Tr[ ] " " 35

CRS " Y = AX Y, X n L do k=1,l! do i=1,n! do j=row_ptr(i), row_ptr(i+1)-1! Y(i,k)=Y(i,k)+A(j)*X(col_ind(j),k)! end do [ ]" X " Fortran " L 36

[ ]" X, Y do i=1,n! do j=row_ptr(i), row_ptr(i+1)-1! do k=1,l! Y(k,i)=Y(k,i)+A(j)*X(k,col_ind(j))! end do X L " 1 " Y " 37

n#l L#L " n#l L#L T k = R k V k α k T T k = RT k αt k VT k do j=1,n! do i=1,l! T(i,j)=R(i,j)! do j=1,n! do i=1,l! do k=1,l! T(k,j)=T(k,j) Alpha(k,i)*V(i,j)! Alpha! k " " 38

L#n n#l! k, " k " " C k = R H 0 V k do j=1,n! do i=1,l! do k=1,l! C(k,i)=C(k,i)+R0(k,j)*V(i,j)! end do R0 " C k " 39

OpenMP " "!$OMP PARALLEL! [ ]"!$OMP END PARALLEL " " "!$OMP PARALLEL "!$OMP END PARALLEL " 40

OpenMP 1.!$OMP DO! do i=1,n! do j=row_ptr(i), row_ptr(i+1)-1! do k=1,l! Y(k,i)=Y(k,i)+A(j)*X(k,col_ind(j))! end do do!$omp DO 41

OpenMP 2. n#l L#L!$OMP DO! do j=1,n! do i=1,l! T(i,j)=R(i,j)!!$OMP DO! do j=1,n! do i=1,l! do k=1,l! T(k,j)=T(k,j) Alpha(k,i)*V(i,j)! 2 42

OpenMP 3. L#n n#l Reduction NTH = OMP_GET_NUM_THREADS()! MYID = OMP_GET_THREAD_NUM()+1!!$OMP SINGLE! allocate( TMP(L,L,NTH) )!!$OMP END SINGLE NTH: " MYID: " TMP: 43

OpenMP C k = R H 0 V k!$omp DO! do j=1,n! do i=1,l! do k=1,l! TMP(k,i,MYID) = TMP(k,i,MYID)+R0(k,j)*V(i,j)!!$OMP BARRIER!!$OMP SINGLE! do k=1,nth! do j=1,l! do i=1,l! C(i,j) = C(i,j) + TMP(i,j,k)!!$OMP END SINGLE " Reduction 44

GFLOPS " 1,572,864 80,216,064 " T2K-Tsukuba 1 147.2 GFLOPS " CPU : AMD Opteron 2.3GHz # 4 OpenMP 16 " 45

OpenMP [ ]" 1, 572, 864" 80, 216, 064" 4" [ ]" CPU: Intel Xeon X5550 2.67 # 2" Mem: 48GBytes" OS: Cent OS 5.3" Intel Fortran ver. 11.1" -fast -openmp / 1 303.49 (179) 1.6955 1.00 2 183.07 (179) 1.0227 1.66 3 138.07 (179) 0.7713 2.20 4 104.61 (181) 0.5749 2.95 5 80 57 (181) 0.4451 3.81 6 78.56 (181) 0.4340 3.91 7 74.96 (181) 0.4141 4.09 8 68.18 (181) 0.3767 4.50 46

" Krylov " " " Block Krylov OpenMP " 47