! CCS HPC! I " tadano@cs.tsukuba.ac.jp" " 1
" " " " " " " 2
3
" " Ax = b" " " 4
Ax = b" A = a 11 a 12... a 1n a 21 a 22... a 2n...... a n1 a n2... a nn, x = x 1 x 2. x n, b = b 1 b 2. b n " " 5
Gauss LU 1)! A " " " 2) " Krylov 1)! A " " 2) " 6
Krylov Ax = b x k = x 0 + z k, z k K k (A; r 0 ) Krylov K k (A; r 0 ) = span(r 0, Ar 0,..., A k 1 r 0 ) r k = b Ax k = r 0 Az k K k+1 (A; r 0 ) x 0 " x 1 " x 2 "! x k 1 " x k " x" Krylov 7
Hermite 1. Hermite A = A H Conjugate Gradient method: CG " Conjugate Residual method: CR " Mininal Residual Method: MINRES Hermite " " Hermite A = A H = Ā T (a i j = ā ji ) 8
Hermite x 0 is an initial guess, Compute r 0 = b Ax 0, Set p 0 = r 0, For k = 0, 1,..., until r k 2 ε TOL b 2 do : q k = Ap k, α k = (r k, r k ) (p k, q k ), x k+1 = x k + α k p k, r k+1 = r k α k q k, β k = (r k+1, r k+1 ), (r k, r k ) p k+1 = r k+1 + β k p k, End For CG 9
Hermite 2. Hermite A A H Bi-Conjugate Gradient: BiCG " Conjugate Residual Squared: CGS " BiCG Stabilization: BiCGSTAB Generalized Conjugate Residual: GCR " Generalized Minimal Residual: GMRES 10
Hermite x 0 is an initial guess, Compute r 0 = b Ax 0, Choose r 0 such that (r 0, r 0) 0, Set p 0 = r 0 and p 0 = r 0, For k = 0, 1,..., until r k 2 ε TOL b 2 do : q k = Ap k, q k = AH p k, α k = (r k, r k ) (p k, q k ), x k+1 = x k + α k p k, r k+1 = r k α k q k, r k+1 = r k ᾱ kq k, β k = (r k+1, r k+1 ), (r k, r k ) p k+1 = r k+1 + β k p k, p k+1 = r k+1 + β k p k, End For BiCG 11
Hermite x 0 is an initial guess, Compute r 0 = b Ax 0, Set p 0 = r 0 and q 0 = s 0 = Ar 0, For k = 0, 1,..., until r k 2 ε TOL b 2 do : α k = (q k, r k ) (q k, q k ), x k+1 = x k + α k p k, r k+1 = r k α k q k, s k+1 = Ar k+1, β k,i = (q i, s k+1 ), (i = 0, 1,..., k) p k+1 = r k+1 + q k+1 = s k+1 + End For (q i, q i ) k i=0 β k,i p i, k β k,i q i, i=0 1 1 " " GCR 12
rk 2/ b 2 Relative residual norm,! 10 0 10-2 10-4 10-6 10-8 10-10 10-12 0 50 100 150 200 250 300 Iteration number, k! " BiCG " CGS " BiCGSTAB " GCR 13
3. A = A T A H " Conjugate Orthogonal Conjugate Gradient: COCG " 1 1 " A = A T A H (a i j = a ji ā ji ) 14
x 0 is an initial guess, Compute r 0 = b Ax 0, Set p 0 = r 0, For k = 0, 1,..., until r k 2 ε TOL b 2 do : q k = Ap k, α k = ( r k, r k ) ( p k, q k ), x k+1 = x k + α k p k, r k+1 = r k α k q k, β k = ( r k+1, r k+1 ), ( r k, r k ) p k+1 = r k+1 + β k p k, End For COCG 15
2 Poisson 2 u x + 2 u 2 y = f, 2 in Ω u = ū, on Ω f, ū! x, y (M+1) " 5 y" 1" O" 1" x" M#M M 4 " 5M 2 4M 16
A = a 11 0 a 13 0 a 15 0 a 22 0 a 24 a 25 a 31 a 32 a 33 0 0 0 0 a 43 a 44 0 0 a 52 0 a 54 a 55 "val!:" " "col_ind!:" "row_ptr":" " """ val: a 11 a 13 a 15 a 22 a 24 a 25 a 31 a 32 a 33 a 43 a 44 a 52 a 54 a 55 col_ind: 1 3 5 2 4 5 1 2 3 3 4 2 4 5 row_ptr: 1 4 7 10 12 15 +1 17
A = a 11 0 a 13 0 a 15 0 a 22 0 a 24 a 25 a 31 a 32 a 33 0 0 0 0 a 43 a 44 0 0 a 52 0 a 54 a 55 "val!:" " "row_ind!:" "col_ptr":" " """ val: a 11 a 31 a 22 a 32 a 52 a 13 a 33 a 43 a 24 a 44 a 54 a 15 a 25 a 55 row_ind: 1 3 2 3 5 1 3 4 2 4 5 1 2 5 col_ptr: 1 3 6 9 12 15 +1 18
CRS A x y = Ax" y 1 a 11 a 12... a 1n x 1 y 2 a 21 a 22... a 2n x 2 =........ y n a n1 a n2... a nn x n Fortran Code! do i=1,n! y(i) = 0.0D0! do j=row_ptr(i), row_ptr(i+1)-1! y(i) = y(i)+val(j)*x(col_ind(j))! end do 19
CCS A x y = Ax" x 1 x 2 y = [a 1, a 2,..., a n ] Fortran Code! do i=1,n! y(i) = 0.0D0! do j=1,n! do i=col_ptr(j),col_ptr(j+1)-1! y(row_ind(i)) = y(row_ind(i))+val(i)*x(j)!. x n = n i=1 a i x i 20
CRS y = Ax Proc. 0! Proc. 1! Proc. 2! *" =" Proc. 3! A" x" " " y" MPI_Gather " Proc. 0 21
CCS y = Ax Proc. 0! Proc. 1! Proc. 2! Proc. 3! *" =" +" +" +" A" " x" " y" MPI_Reduce Proc. 0 " 22
x" (x, y) = n j=1 x j y j Proc. 0! Proc. 1! Proc. 2! Proc. 3! y" =" tmp_sum! =" =" =" tmp_sum! tmp_sum! tmp_sum! sum! MPI_Reduce Proc.0 23
MPI program main! include 'mpif.h'!...! call mpi_init(ierr)! call mpi_comm_size(mpi_comm_world, nprocs, ierr)! call mpi_comm_rank(mpi_comm_world, myrank, ierr)!...! tmp_sum = (0.0D0, 0.0D0)! do i=istart(myrank+1), iend(myrank+1)! tmp_sum = tmp_sum + conj(x(i)) * y(i)!! call mpi_reduce(tmp_sum, sum, 1, mpi_double_complex,! mpi_sum, 0, mpi_comm_world, ierr)!...! call mpi_finalize(ierr)! 24
y = y + ax x, y a a MPI_Bcast Proc. 0! Proc. 1! Proc. 2! Proc. 3! a" a" a" a" a" x" y" +" +" +" +" 25
Krylov Krylov " " " " Ax = b" 26
A A K 1 K 2 K 1 1 AK 1 2 I Ax = b (K 1 1 AK 1 2 )(K 2x) = K 1 1 b A 1 A M 1 AM I, MA I Ax = b MAx = Mb Ax = b (AM)(M 1 x) = b 27
A 1 M F(M) = I AM 2 F M F(M) = I AM 2 F = n j=1 e j Am j 2 2 M " M M = A 1 n " A F = n i=1 n a i j 2 j=1 28
A 1 A Neumann I A < 1 A 1 = [I (I A)] 1 = j=0 (I A) j m M A 1 M = M " m j=0 (I A) j 29
Relative residual norm,! rk 2/ b 2 10 2 10 0 10-2 10-4 10-6 10-8 10-10 10-12 0 100 200 300 400 Iteration number, k " " BiCG " BiCG 30
!
L AX = B A : n " X = x (1), x (2),..., x (L), B = b (1), b (2),..., b (L) " A = LU " L OK" " L 32
Block Krylov Block Krylov Block BiCG "O Leary (1980)! Block GMRES "Vital (1990)! Block QMR "Freund (1997)! Block BiCGSTAB "Guennouni (2003)! Block BiCGGR!Tadano (2009)! " 33
Block Krylov 1 10 1 Relative residual norm 10-2 10-5 10-8 10-11 10-14 0 500 1000 1500 2000 Iteration number Block Krylov " L = 1 L = 2 L = 4 " 34
Block BiCGSTAB X 0 C n L is an initial guess, Compute R 0 = B AX 0, Set P 0 = R 0, Choose R 0 C n L, For k = 0, 1,..., until R k F ε B F do: V k = AP k, Solve ( R H 0 V k)α k = R H 0 R k for α k, T k = R k V k α k, Z k = AT k, ζ k = Tr Z H k T k /Tr Z H k Z k, X k+1 = X k + P k α k + ζ k T k, R k+1 = T k ζ k Z k, Solve ( R H 0 V k)β k = R H 0 Z k for β k, P k+1 = R k+1 + (P k ζ k V k )β k, End BiCGSTAB " " 1.! " "1 L " " 2.!! k, " k L " " 3.! " " " " 4. " # k " "Tr[ ] " " 35
CRS " Y = AX Y, X n L do k=1,l! do i=1,n! do j=row_ptr(i), row_ptr(i+1)-1! Y(i,k)=Y(i,k)+A(j)*X(col_ind(j),k)! end do [ ]" X " Fortran " L 36
[ ]" X, Y do i=1,n! do j=row_ptr(i), row_ptr(i+1)-1! do k=1,l! Y(k,i)=Y(k,i)+A(j)*X(k,col_ind(j))! end do X L " 1 " Y " 37
n#l L#L " n#l L#L T k = R k V k α k T T k = RT k αt k VT k do j=1,n! do i=1,l! T(i,j)=R(i,j)! do j=1,n! do i=1,l! do k=1,l! T(k,j)=T(k,j) Alpha(k,i)*V(i,j)! Alpha! k " " 38
L#n n#l! k, " k " " C k = R H 0 V k do j=1,n! do i=1,l! do k=1,l! C(k,i)=C(k,i)+R0(k,j)*V(i,j)! end do R0 " C k " 39
OpenMP " "!$OMP PARALLEL! [ ]"!$OMP END PARALLEL " " "!$OMP PARALLEL "!$OMP END PARALLEL " 40
OpenMP 1.!$OMP DO! do i=1,n! do j=row_ptr(i), row_ptr(i+1)-1! do k=1,l! Y(k,i)=Y(k,i)+A(j)*X(k,col_ind(j))! end do do!$omp DO 41
OpenMP 2. n#l L#L!$OMP DO! do j=1,n! do i=1,l! T(i,j)=R(i,j)!!$OMP DO! do j=1,n! do i=1,l! do k=1,l! T(k,j)=T(k,j) Alpha(k,i)*V(i,j)! 2 42
OpenMP 3. L#n n#l Reduction NTH = OMP_GET_NUM_THREADS()! MYID = OMP_GET_THREAD_NUM()+1!!$OMP SINGLE! allocate( TMP(L,L,NTH) )!!$OMP END SINGLE NTH: " MYID: " TMP: 43
OpenMP C k = R H 0 V k!$omp DO! do j=1,n! do i=1,l! do k=1,l! TMP(k,i,MYID) = TMP(k,i,MYID)+R0(k,j)*V(i,j)!!$OMP BARRIER!!$OMP SINGLE! do k=1,nth! do j=1,l! do i=1,l! C(i,j) = C(i,j) + TMP(i,j,k)!!$OMP END SINGLE " Reduction 44
GFLOPS " 1,572,864 80,216,064 " T2K-Tsukuba 1 147.2 GFLOPS " CPU : AMD Opteron 2.3GHz # 4 OpenMP 16 " 45
OpenMP [ ]" 1, 572, 864" 80, 216, 064" 4" [ ]" CPU: Intel Xeon X5550 2.67 # 2" Mem: 48GBytes" OS: Cent OS 5.3" Intel Fortran ver. 11.1" -fast -openmp / 1 303.49 (179) 1.6955 1.00 2 183.07 (179) 1.0227 1.66 3 138.07 (179) 0.7713 2.20 4 104.61 (181) 0.5749 2.95 5 80 57 (181) 0.4451 3.81 6 78.56 (181) 0.4340 3.91 7 74.96 (181) 0.4141 4.09 8 68.18 (181) 0.3767 4.50 46
" Krylov " " " Block Krylov OpenMP " 47