3 3 BASIC C++ 8 Tflop/s 8TB [] High precision symmetric eigenvalue computation through exact tridiagonalization by using rational number arithmetic Hikaru Samukawa Abstract: Since rational number arithmetic, which is different from floating-point arithmetic or multiple-precision arithmetic, can provide exact computations, it is valuable from mathematical point of view. However its application area is limited because of its lack of ability to compute irrational numbers, such as eigenvalue analysis. A self-validation technique for eigenvalue analysis is difficult because it requires deep mathematical and programming knowledge. In this paper, we propose a method of exact reduction from symmetric matrix to tridiagonal matrix by using implicit square root formulation. A rational number arithmetic is possible in decimal BASIC for small problems. However for the normal size problems, a huge memory and tuning including parallelization are required. We have developed a C++ programming environment capable of using rational number variables mixed with other types of variables upon multi-digit integers and rational number arithmetics layers. Currently a general purpose server provides 8 Tflop/s computing power and 8 TB memory capacity[]. Since the rational number arithmetic is sensitive to a bit error, it is suitable to be used for benchmark programs. Keywords: rational number arithmetic, self-varidation computation, symmetric eigenvalue problem, implicit square root formulation c 203 Information Processing Society of Japan
. a, b, c, d b a ± d bc ± ad = c ac b a d c = bd ac b a d c = bc ad n m nm [2][3] BASIC [4] B BASIC [5] BASIC 0 5 0 000 2 Intel x86 0 5 5 * IEEE a e d i 0 2 ( ) 52 a = ± d i 2 i 2 e = A 2 e () i=0 B = A 2 52 2 54 7 B () a = B 2 e 52 C =52 e a = B 2 C B B 2 C B k 0 B 2 k 2 C k a 0.5 0 7 IEEE e max = 023 Sibaura Institute of Technology, Minuma, Saitama 337 8570, Japan * 0 0 2 023 52 2 0 292 2 0.4 0.4 00000000000000 }{{} 4 3, 602, 879, 70, 896, 397 0.4 = 9, 007, 99, 254, 740, 992 7, 205, 759, 403, 792, 795 0.400 0 = 8, 04, 398, 509, 48, 984 (2) (3) 0.4 2 0.4 2 6 7. BASIC 2000 a ij = i + j (4) 2 0 5 n =2 30% n =3 00%.2 c 203 Information Processing Society of Japan 2
*2 2 f(x) (a, b) c = a + b 2 a b 4 f(x) =x 4 0x 3 +5x 2 7x + (5) f(x) =(x )(x 3 9x 2 +6x ) x = (a, b) =(0.9,.2) 2 2 20 39 40 8 80 59 60 32 320, *3 c 2 k 0 2 k *4 c c c best approximating fraction ratround(x,m) x 2 m x 0 m [6] c *2 LU ( *3 9 x = 0 + 6 ) 2= 2 5 20 ( 9 x 2 = 0 + 2 ) 2= 39 20 40 *4 2. Q Q T Q = I m n A Q Q j q j for j =,n j p j = a j (q T k a j )q k loop k= q j = p j p j p j = p T j p j 2 Ax b m n A n =3A QR r r 2 r 3 A = QR =(q q 2 q 3 ) 0 r 22 r 23 (6) Q T 0 0 r 33 Q T Ax = Q T b R q k = p p T k = k p k p T k p k r kk (q T k a j)q k (q T k a j )q k = (p T k a j ) p r kk r k = kk rkk 2 (p T k a j )p k 2 Q *5 n i j i j p T i p j =0 (6) A = PR =(p p 2 p 3 ) 0 r 2 r 3 r r r 23 r 0 0 (7) q q i q i = rp i r pi 2 r 2 rp 2 i *5 Ax b P T Ax = P T b c 203 Information Processing Society of Japan 3
implicit square root formulation *6 A = CC T LDL T *7 Ax = λbx B = CC T (C AC T )(C T x)=λ(c T x) 3. Ax = λx 3 3 3 LDL T *6 Classical Gram-Schmidt CGS Modified Gram-Schmidt MGS CGS 2 CGS 3 [7] CGS MGS CGS *7 LDL T D = diag(d i ) s i = d i s i S LDL T = LSSL T C = LS LDL T CC T A ωi LDL T det(a ωi) D A ω A n 3 3 3 3 3 2 3 3 Wilkinson k 3 k [8], p. 300 3 *8 3 3 conjugate gradient CG 3. A 3 Q T AQ = T Q AQ = QT *8 3 3 2 c 203 Information Processing Society of Japan 4
A[q, q 2,...,q n ] α β. β α.. 2 =[q, q 2,...,q n ]...... βn β n α n Aq = α q + β q 2 q T qt q 2 = 0 α = q T Aq r =(A α I)q β = r q 2 = r β 2 j Aq j = β j q j + α j q j + β j q j+ (8) α j = q T j Aq j q j+ r j =(A α j I)q j β j q j (9) β j = r j q j+ = r j q j β j α j q j β j (8) Aq j = s j q j + α j q j + s j q j+ 3 2 q = sp 3.2 CG 3 A Ax = b CG CG [9] 3 3 [0], p. 528 γ k = p T k Ap k, δk = r k, β k (0) q j r k *9 β k β k *9 q CG r 0 3 β k 2 β 2... B k =, ()... βk Δ 3 T k T k =Δ Bk T Pk T AP k B k Δ γ =Δ Bk T... γ k B kδ (2) B T k P T k AP kb k 3 d i s i T k d s s d 2 s 2 T k =Δ. s.. Δ 2 d k = δ d δ δ 2 s δ δ 2 s δ 2 d 2 δ2 δ 3 s 2 δ2 δ 3 s 2... δ k d k (3) 3 2 δ k * 0 3 T k ( δ i δ i+ s i ) 2 3 ω T ωi = A ωi *0 q k r k CG r k 2 r k = p k β k p k β k 2 2 r T k r k c 203 Information Processing Society of Japan 5
4. 4. 2 32 32 64 32,768 0 3 FFT FFT 2 32 2 24 FFT 2 24 2 8 = 256 0 800 2 GCD GCD [2][3] [9] longint rational 4 C++ longint rational 4.2 LDL LDL T A u ij l ij l ji = u ij /u ii A = LDL T D U = DL T t ij = l ji L T i u ij = a ij t ki u kj, (i j) (4) k= t ij = u ij u ii (i<j) (5) Fortran C n * a ij = n max(i, j) + (6) (4) a ij 0.5 4 n = 00 0.7 3.8 92.6 a ij 0.5 989.3 Intel Core i5 2.67GHz D d i n i + n i +2 O(n 3 ) D d i 2 * 2 * n = 5 LDL T L 5 4 3 2 0 0 0 0 4 4 3 2 4 3 3 3 2, 0 0 0 5 3 3 0 0 5 4 D 2 2 2 2 2 2 2 0 5 4 3 5 4 3 2 diag ( 5, 4 5, 3 4, 2 3, 2 ) *2 A n d i ill-condition c 203 Information Processing Society of Japan 6
O(n 4 ) a ij 0.5 d i 33 d 300 0,000 (4) 2 O(n 4 ) O(n 5 ) O(n 5 ) FFT LDL T LDL T 00 000 O(n 3 ) O(n 5 ) 4.3 3 4.3. 3 (3) 3 d i,s i,δ i CG k =0; r 0 = initialize; δ = r 0 2 β =; d = γ ; s = γ δ do k = k + if(k =) p = r 0 else β k = rt k r k r T k 2 r k 2 p k = r k + β k p k end α k = k r rt k p T Ap k k γ k = p T k Ap k if(k >) d k = γ k + γ k βk 2 δ k s k = γ k β k end r k = r k α k Ap k δ k+ = r k 2 if(δ k+ =0)exitdo loop CG n 3 n 3 T b i 0 a b 2 b 2 a 2 b 3 T = b 3 a 3... T i T i T i ωi p i (ω) =det(t i ωi) i n p i (ω) p 0 (ω) = p (ω) =a ω do i =2,n p i (ω) =(a i ω)p i (ω) b 2 i p i 2(ω) loop b 2 i δ iδ i+ s 2 i 4.3.2 [ω l,ω u ] dω = ωu ω l n k =0; dω l = ω l ; ω = ω l do dω u = dω l + dω call TDCdet(A, dω u,f,no) if(no k = ) then call Bisect(A, dω l,dω u,ω) k = k + dω l = dω u dω = ωu ω l n else if(no k = 0) then else end if dω l = dω u dω = dω 2 c 203 Information Processing Society of Japan 7
loop if(k = n) then exit loop TDCdet 3 T ω f(ω) = T ωi f ω no (dω l,dω u ) (a, b) Bisect 2 SUB Bisect(A, a, b, ω) call TDCdet(A, a, fa, no) call TDCdet(A, b, fb, no) do ( ) fa (b a) c = ratround a fb fa, ord call TDCdet(A, c, fc, no) if(fc =0)thenexitloop if(fa fc < 0) then b = c; fb = fc else a = c; fa = fc end if if( fa fb <ɛ)thenexitloop loop ω = c 2 2 (a, f(a)) (b, f(b)) x x c = a f(a) (b a) f(b) f(a) (7) ratround 4.3.3 2 2 ω 0 ω n n + det(a ω i I) characteristic polynomial n + n + n =3 27 9 3 a 0 det(a 3I) 8 4 2 a a = det(a 2I) 2 det(a I) 0 0 0 det(a) a 3 ω 0 ω n 0 n a 0 a n 2 * 3 f(ω) =a 0 ω n + a ω n + + a n =0 2 2 ratround n 3 * 4 n =4 (5) ratround m a b f(a) f(b) 4.3.4 x j =(A λ i I) x j (8) λ i 3 3 [] 3 3 T Wilkinson A [2] 3 *3 2 *4 n =4 x =(, 0,, ) T Ax = x c 203 Information Processing Society of Japan 8
3 32 33 34 35 36 25 30 9 24 3 8 7 2 2 3 4 5 6 T =Δ Bk T Pk T A P k B k Δ A A A T A 3 T A A 3 3 4.4 4.4. l m m 5 [7] 4 4 3 2 2 i λ i λ i 0.763932... 2,3.763932... 73, 95 4 2.763932... 5,6 3. 0027, 0044 7,8,9,0 4. 9982, 0044,2 5. 999, 0000 3 5.236068... 4,5 6.236068... 863, 889 6 7.236068... 4 0 A = 4 0 0 4 0 4 4 4 2 m m 4 m 3 q * 5 CG 3 λ λ 5 λ 3 i i CG 6 λ 2 λ 7 λ 4 i MOD(i 2, 6) CG 9 * 6 CG 3 3 T k 3 2 *5 *6 28 0 c 203 Information Processing Society of Japan 9
4.4.2 3 6 * 7 2 print * 8 0. 4 4 a ii =0.4 A a a (2) (3) 0 20 i =5, 6.29999999999999988048.3000000000000005837 6 * 9 (λ m) k k * 20 Fortran C *7 R print digit 6 *8 (a + b)+c a +(b + c) *9 BASIC *20 (λ m) k m 5. Ax = λx A k...3 k 3 BASIC Fortran C c 203 Information Processing Society of Japan 20
* 2 976 Cray- 60M flop/s 202 Blue Gene/Q 6P flop/s [3] LU 000 * 22 CG (6) LDL T i, j d i l ij A. LDL Fortran [7], p. 28 Fortran 2 C++ rational void LDL(rational_matrix& a, int n){ rational s,t; *2 *22 fail } int i,j,k; for (j=; j < n; ++j) { for (i=; i < j; ++i) { s=0; for (k=0; k < i; ++k) { s = s + a[k][i] * a[k][j]; } a[i][j] = a[i][j] - s; } s=0; for (k=0; k <= j-; ++k) { t = a[k][j]/a[k][k]; s = s + t * a[k][j]; a[k][j] = t; } a[j][j] = a[j][j] - s; } [] B. Sinharoy, R. N. Kalla, and et.al., IBM Power7 Multicore Server Processor, IBM J. Res. Develop., 55(3):/ /29, 20. [2] D. Knuth, The Art of Computer Programming, Volume 2, Addison-Wesley, 969. / 98. [3] D. Knuth, The Art of Computer Programming, Volume 2, Third Edition, Addison-Wesley, 998. The Art of Computer Programming, Third Edition 2004. [4] http://hp.vector.co.jp/authors/va008683/. [5],, B,, 2008. [6] Hikaru Samukawa, Rational Number Arithmetic Including Square Root Handling, In 3th JSST Annual Conference (JSST 202) International Conference in Modeling and Simulation Technology, 202. [7],,,, HPC ITText,, 2009. [8] J. H. Wilkinson, Algebraic Eigenvalue Problem, Oxford University Press, 965. [9] Hikaru Samukawa, A Study of Rational Number Arithmetic, In 30th JSST Annual Conference (JSST 20) International Conference in Modeling and Simulation Technology, 20. [0] G. H. Golub and C. F. Van Loan, Matrix Computations third edition, Johns Hopkins University Press, 996. [] I. S. Dhillon, A New O(n 2 ) Algorithm for the Symmetric Tridiagonal Eigenvalue/Eigenvector Problem, 989. [2] J. H. Wilkinson, Rounding Errors in Algebraic Processes, Her Britannic Majesty s Stationery Office, 963,, 974. c 203 Information Processing Society of Japan 2
[3] https://www.llnl.gov/news/newsreleases/202/jun/nr- 2-06-07.html. c 203 Information Processing Society of Japan 22