. a, b, c, d b a ± d bc ± ad = c ac b a d c = bd ac b a d c = bc ad n m nm [2][3] BASIC [4] B BASIC [5] BASIC Intel x * IEEE a e d

Similar documents
23 Fig. 2: hwmodulev2 3. Reconfigurable HPC 3.1 hw/sw hw/sw hw/sw FPGA PC FPGA PC FPGA HPC FPGA FPGA hw/sw hw/sw hw- Module FPGA hwmodule hw/sw FPGA h

2017 (413812)

1 n A a 11 a 1n A =.. a m1 a mn Ax = λx (1) x n λ (eigenvalue problem) x = 0 ( x 0 ) λ A ( ) λ Ax = λx x Ax = λx y T A = λy T x Ax = λx cx ( 1) 1.1 Th

IPSJ SIG Technical Report Vol.2016-CE-137 No /12/ e β /α α β β / α A judgment method of difficulty of task for a learner using simple

1 Fig. 1 Extraction of motion,.,,, 4,,, 3., 1, 2. 2.,. CHLAC,. 2.1,. (256 ).,., CHLAC. CHLAC, HLAC. 2.3 (HLAC ) r,.,. HLAC. N. 2 HLAC Fig. 2

2 HI LO ZDD 2 ZDD 2 HI LO 2 ( ) HI (Zero-suppress ) Zero-suppress ZDD ZDD Zero-suppress 1 ZDD abc a HI b c b Zero-suppress b ZDD ZDD 5) ZDD F 1 F = a

THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE.

IPSJ SIG Technical Report Vol.2012-CG-148 No /8/29 3DCG 1,a) On rigid body animation taking into account the 3D computer graphics came

<95DB8C9288E397C389C88A E696E6462>

Input image Initialize variables Loop for period of oscillation Update height map Make shade image Change property of image Output image Change time L

17 Proposal of an Algorithm of Image Extraction and Research on Improvement of a Man-machine Interface of Food Intake Measuring System

Iteration 0 Iteration 1 1 Iteration 2 Iteration 3 N N N! N 1 MOPT(Merge Optimization) 3) MOPT MOP

1 [1, 2, 3, 4, 5, 8, 9, 10, 12, 15] The Boston Public Schools system, BPS (Deferred Acceptance system, DA) (Top Trading Cycles system, TTC) cf. [13] [

Vol.55 No (Jan. 2014) saccess 6 saccess 7 saccess 2. [3] p.33 * B (A) (B) (C) (D) (E) (F) *1 [3], [4] Web PDF a m

x, y x 3 y xy 3 x 2 y + xy 2 x 3 + y 3 = x 3 y xy 3 x 2 y + xy 2 x 3 + y 3 = 15 xy (x y) (x + y) xy (x y) (x y) ( x 2 + xy + y 2) = 15 (x y)

A11 (1993,1994) 29 A12 (1994) 29 A13 Trefethen and Bau Numerical Linear Algebra (1997) 29 A14 (1999) 30 A15 (2003) 30 A16 (2004) 30 A17 (2007) 30 A18

A Feasibility Study of Direct-Mapping-Type Parallel Processing Method to Solve Linear Equations in Load Flow Calculations Hiroaki Inayoshi, Non-member

1 Web [2] Web [3] [4] [5], [6] [7] [8] S.W. [9] 3. MeetingShelf Web MeetingShelf MeetingShelf (1) (2) (3) (4) (5) Web MeetingShelf

IPSJ SIG Technical Report Vol.2011-EC-19 No /3/ ,.,., Peg-Scope Viewer,,.,,,,. Utilization of Watching Logs for Support of Multi-

GPGPU

2 ( ) i

,,,,., C Java,,.,,.,., ,,.,, i

( )

& Vol.5 No (Oct. 2015) TV 1,2,a) , Augmented TV TV AR Augmented Reality 3DCG TV Estimation of TV Screen Position and Ro

JOURNAL OF THE JAPANESE ASSOCIATION FOR PETROLEUM TECHNOLOGY VOL. 66, NO. 6 (Nov., 2001) (Received August 10, 2001; accepted November 9, 2001) Alterna

1 Abstract 2 3 n a ax 2 + bx + c = 0 (a 0) (1) ( x + b ) 2 = b2 4ac 2a 4a 2 D = b 2 4ac > 0 (1) 2 D = 0 D < 0 x + b 2a = ± b2 4ac 2a b ± b 2

The 15th Game Programming Workshop 2010 Magic Bitboard Magic Bitboard Bitboard Magic Bitboard Bitboard Magic Bitboard Magic Bitboard Magic Bitbo

258 5) GPS 1 GPS 6) GPS DP 7) 8) 10) GPS GPS ) GPS Global Positioning System

II 2 3.,, A(B + C) = AB + AC, (A + B)C = AC + BC. 4. m m A, m m B,, m m B, AB = BA, A,, I. 5. m m A, m n B, AB = B, A I E, 4 4 I, J, K


[2] OCR [3], [4] [5] [6] [4], [7] [8], [9] 1 [10] Fig. 1 Current arrangement and size of ruby. 2 Fig. 2 Typography combined with printing

Web Web Web Web Web, i

JFE.dvi

日本感性工学会論文誌

(a) (b) (c) Canny (d) 1 ( x α, y α ) 3 (x α, y α ) (a) A 2 + B 2 + C 2 + D 2 + E 2 + F 2 = 1 (3) u ξ α u (A, B, C, D, E, F ) (4) ξ α (x 2 α, 2x α y α,

T rank A max{rank Q[R Q, J] t-rank T [R T, C \ J] J C} 2 ([1, p.138, Theorem 4.2.5]) A = ( ) Q rank A = min{ρ(j) γ(j) J J C} C, (5) ρ(j) = rank Q[R Q,

2. CABAC CABAC CABAC 1 1 CABAC Figure 1 Overview of CABAC 2 DCT 2 0/ /1 CABAC [3] 3. 2 値化部 コンテキスト計算部 2 値算術符号化部 CABAC CABAC

28 Horizontal angle correction using straight line detection in an equirectangular image

コンピュータ概論

2) TA Hercules CAA 5 [6], [7] CAA BOSS [8] 2. C II C. ( 1 ) C. ( 2 ). ( 3 ) 100. ( 4 ) () HTML NFS Hercules ( )

,,.,.,,.,.,.,.,,.,..,,,, i

IPSJ SIG Technical Report Vol.2014-EIP-63 No /2/21 1,a) Wi-Fi Probe Request MAC MAC Probe Request MAC A dynamic ads control based on tra

( ) [1] [4] ( ) 2. [5] [6] Piano Tutor[7] [1], [2], [8], [9] Radiobaton[10] Two Finger Piano[11] Coloring-in Piano[12] ism[13] MIDI MIDI 1 Fig. 1 Syst

(a) 1 (b) 3. Gilbert Pernicka[2] Treibitz Schechner[3] Narasimhan [4] Kim [5] Nayar [6] [7][8][9] 2. X X X [10] [11] L L t L s L = L t + L s

linearal1.dvi

Fig. 3 Flow diagram of image processing. Black rectangle in the photo indicates the processing area (128 x 32 pixels).

浜松医科大学紀要

EGunGPU

AtCoder Regular Contest 073 Editorial Kohei Morita(yosupo) A: Shiritori if python3 a, b, c = input().split() if a[len(a)-1] == b[0] and b[len(

Vol.54 No (July 2013) [9] [10] [11] [12], [13] 1 Fig. 1 Flowchart of the proposed system. c 2013 Information

25 II :30 16:00 (1),. Do not open this problem booklet until the start of the examination is announced. (2) 3.. Answer the following 3 proble

(a) (b) 1 JavaScript Web Web Web CGI Web Web JavaScript Web mixi facebook SNS Web URL ID Web 1 JavaScript Web 1(a) 1(b) JavaScript & Web Web Web Webji


149 (Newell [5]) Newell [5], [1], [1], [11] Li,Ryu, and Song [2], [11] Li,Ryu, and Song [2], [1] 1) 2) ( ) ( ) 3) T : 2 a : 3 a 1 :

(Basic of Proability Theory). (Probability Spacees ad Radom Variables , (Expectatios, Meas) (Weak Law

6 2. AUTOSAR 2.1 AUTOSAR AUTOSAR ECU OSEK/VDX 3) OSEK/VDX OS AUTOSAR AUTOSAR ECU AUTOSAR 1 AUTOSAR BSW (Basic Software) (Runtime Environment) Applicat

) 9 81

A S- hara/lectures/lectures-j.html r A = A 5 : 5 = max{ A, } A A A A B A, B A A A %

2012年度HPCサマーセミナー_多田野.pptx

ID 3) 9 4) 5) ID 2 ID 2 ID 2 Bluetooth ID 2 SRCid1 DSTid2 2 id1 id2 ID SRC DST SRC 2 2 ID 2 2 QR 6) 8) 6) QR QR QR QR

. IDE JIVE[1][] Eclipse Java ( 1) Java Platform Debugger Architecture [5] 3. Eclipse GUI JIVE 3.1 Eclipse ( ) 1 JIVE Java [3] IDE c 016 Information Pr

v v = v 1 v 2 v 3 (1) R = (R ij ) (2) R (R 1 ) ij = R ji (3) 3 R ij R ik = δ jk (4) i=1 δ ij Kronecker δ ij = { 1 (i = j) 0 (i

2

橡最新卒論

DPA,, ShareLog 3) 4) 2.2 Strino Strino STRain-based user Interface with tacticle of elastic Natural ObjectsStrino 1 Strino ) PC Log-Log (2007 6)

Transcription:

3 3 BASIC C++ 8 Tflop/s 8TB [] High precision symmetric eigenvalue computation through exact tridiagonalization by using rational number arithmetic Hikaru Samukawa Abstract: Since rational number arithmetic, which is different from floating-point arithmetic or multiple-precision arithmetic, can provide exact computations, it is valuable from mathematical point of view. However its application area is limited because of its lack of ability to compute irrational numbers, such as eigenvalue analysis. A self-validation technique for eigenvalue analysis is difficult because it requires deep mathematical and programming knowledge. In this paper, we propose a method of exact reduction from symmetric matrix to tridiagonal matrix by using implicit square root formulation. A rational number arithmetic is possible in decimal BASIC for small problems. However for the normal size problems, a huge memory and tuning including parallelization are required. We have developed a C++ programming environment capable of using rational number variables mixed with other types of variables upon multi-digit integers and rational number arithmetics layers. Currently a general purpose server provides 8 Tflop/s computing power and 8 TB memory capacity[]. Since the rational number arithmetic is sensitive to a bit error, it is suitable to be used for benchmark programs. Keywords: rational number arithmetic, self-varidation computation, symmetric eigenvalue problem, implicit square root formulation c 203 Information Processing Society of Japan

. a, b, c, d b a ± d bc ± ad = c ac b a d c = bd ac b a d c = bc ad n m nm [2][3] BASIC [4] B BASIC [5] BASIC 0 5 0 000 2 Intel x86 0 5 5 * IEEE a e d i 0 2 ( ) 52 a = ± d i 2 i 2 e = A 2 e () i=0 B = A 2 52 2 54 7 B () a = B 2 e 52 C =52 e a = B 2 C B B 2 C B k 0 B 2 k 2 C k a 0.5 0 7 IEEE e max = 023 Sibaura Institute of Technology, Minuma, Saitama 337 8570, Japan * 0 0 2 023 52 2 0 292 2 0.4 0.4 00000000000000 }{{} 4 3, 602, 879, 70, 896, 397 0.4 = 9, 007, 99, 254, 740, 992 7, 205, 759, 403, 792, 795 0.400 0 = 8, 04, 398, 509, 48, 984 (2) (3) 0.4 2 0.4 2 6 7. BASIC 2000 a ij = i + j (4) 2 0 5 n =2 30% n =3 00%.2 c 203 Information Processing Society of Japan 2

*2 2 f(x) (a, b) c = a + b 2 a b 4 f(x) =x 4 0x 3 +5x 2 7x + (5) f(x) =(x )(x 3 9x 2 +6x ) x = (a, b) =(0.9,.2) 2 2 20 39 40 8 80 59 60 32 320, *3 c 2 k 0 2 k *4 c c c best approximating fraction ratround(x,m) x 2 m x 0 m [6] c *2 LU ( *3 9 x = 0 + 6 ) 2= 2 5 20 ( 9 x 2 = 0 + 2 ) 2= 39 20 40 *4 2. Q Q T Q = I m n A Q Q j q j for j =,n j p j = a j (q T k a j )q k loop k= q j = p j p j p j = p T j p j 2 Ax b m n A n =3A QR r r 2 r 3 A = QR =(q q 2 q 3 ) 0 r 22 r 23 (6) Q T 0 0 r 33 Q T Ax = Q T b R q k = p p T k = k p k p T k p k r kk (q T k a j)q k (q T k a j )q k = (p T k a j ) p r kk r k = kk rkk 2 (p T k a j )p k 2 Q *5 n i j i j p T i p j =0 (6) A = PR =(p p 2 p 3 ) 0 r 2 r 3 r r r 23 r 0 0 (7) q q i q i = rp i r pi 2 r 2 rp 2 i *5 Ax b P T Ax = P T b c 203 Information Processing Society of Japan 3

implicit square root formulation *6 A = CC T LDL T *7 Ax = λbx B = CC T (C AC T )(C T x)=λ(c T x) 3. Ax = λx 3 3 3 LDL T *6 Classical Gram-Schmidt CGS Modified Gram-Schmidt MGS CGS 2 CGS 3 [7] CGS MGS CGS *7 LDL T D = diag(d i ) s i = d i s i S LDL T = LSSL T C = LS LDL T CC T A ωi LDL T det(a ωi) D A ω A n 3 3 3 3 3 2 3 3 Wilkinson k 3 k [8], p. 300 3 *8 3 3 conjugate gradient CG 3. A 3 Q T AQ = T Q AQ = QT *8 3 3 2 c 203 Information Processing Society of Japan 4

A[q, q 2,...,q n ] α β. β α.. 2 =[q, q 2,...,q n ]...... βn β n α n Aq = α q + β q 2 q T qt q 2 = 0 α = q T Aq r =(A α I)q β = r q 2 = r β 2 j Aq j = β j q j + α j q j + β j q j+ (8) α j = q T j Aq j q j+ r j =(A α j I)q j β j q j (9) β j = r j q j+ = r j q j β j α j q j β j (8) Aq j = s j q j + α j q j + s j q j+ 3 2 q = sp 3.2 CG 3 A Ax = b CG CG [9] 3 3 [0], p. 528 γ k = p T k Ap k, δk = r k, β k (0) q j r k *9 β k β k *9 q CG r 0 3 β k 2 β 2... B k =, ()... βk Δ 3 T k T k =Δ Bk T Pk T AP k B k Δ γ =Δ Bk T... γ k B kδ (2) B T k P T k AP kb k 3 d i s i T k d s s d 2 s 2 T k =Δ. s.. Δ 2 d k = δ d δ δ 2 s δ δ 2 s δ 2 d 2 δ2 δ 3 s 2 δ2 δ 3 s 2... δ k d k (3) 3 2 δ k * 0 3 T k ( δ i δ i+ s i ) 2 3 ω T ωi = A ωi *0 q k r k CG r k 2 r k = p k β k p k β k 2 2 r T k r k c 203 Information Processing Society of Japan 5

4. 4. 2 32 32 64 32,768 0 3 FFT FFT 2 32 2 24 FFT 2 24 2 8 = 256 0 800 2 GCD GCD [2][3] [9] longint rational 4 C++ longint rational 4.2 LDL LDL T A u ij l ij l ji = u ij /u ii A = LDL T D U = DL T t ij = l ji L T i u ij = a ij t ki u kj, (i j) (4) k= t ij = u ij u ii (i<j) (5) Fortran C n * a ij = n max(i, j) + (6) (4) a ij 0.5 4 n = 00 0.7 3.8 92.6 a ij 0.5 989.3 Intel Core i5 2.67GHz D d i n i + n i +2 O(n 3 ) D d i 2 * 2 * n = 5 LDL T L 5 4 3 2 0 0 0 0 4 4 3 2 4 3 3 3 2, 0 0 0 5 3 3 0 0 5 4 D 2 2 2 2 2 2 2 0 5 4 3 5 4 3 2 diag ( 5, 4 5, 3 4, 2 3, 2 ) *2 A n d i ill-condition c 203 Information Processing Society of Japan 6

O(n 4 ) a ij 0.5 d i 33 d 300 0,000 (4) 2 O(n 4 ) O(n 5 ) O(n 5 ) FFT LDL T LDL T 00 000 O(n 3 ) O(n 5 ) 4.3 3 4.3. 3 (3) 3 d i,s i,δ i CG k =0; r 0 = initialize; δ = r 0 2 β =; d = γ ; s = γ δ do k = k + if(k =) p = r 0 else β k = rt k r k r T k 2 r k 2 p k = r k + β k p k end α k = k r rt k p T Ap k k γ k = p T k Ap k if(k >) d k = γ k + γ k βk 2 δ k s k = γ k β k end r k = r k α k Ap k δ k+ = r k 2 if(δ k+ =0)exitdo loop CG n 3 n 3 T b i 0 a b 2 b 2 a 2 b 3 T = b 3 a 3... T i T i T i ωi p i (ω) =det(t i ωi) i n p i (ω) p 0 (ω) = p (ω) =a ω do i =2,n p i (ω) =(a i ω)p i (ω) b 2 i p i 2(ω) loop b 2 i δ iδ i+ s 2 i 4.3.2 [ω l,ω u ] dω = ωu ω l n k =0; dω l = ω l ; ω = ω l do dω u = dω l + dω call TDCdet(A, dω u,f,no) if(no k = ) then call Bisect(A, dω l,dω u,ω) k = k + dω l = dω u dω = ωu ω l n else if(no k = 0) then else end if dω l = dω u dω = dω 2 c 203 Information Processing Society of Japan 7

loop if(k = n) then exit loop TDCdet 3 T ω f(ω) = T ωi f ω no (dω l,dω u ) (a, b) Bisect 2 SUB Bisect(A, a, b, ω) call TDCdet(A, a, fa, no) call TDCdet(A, b, fb, no) do ( ) fa (b a) c = ratround a fb fa, ord call TDCdet(A, c, fc, no) if(fc =0)thenexitloop if(fa fc < 0) then b = c; fb = fc else a = c; fa = fc end if if( fa fb <ɛ)thenexitloop loop ω = c 2 2 (a, f(a)) (b, f(b)) x x c = a f(a) (b a) f(b) f(a) (7) ratround 4.3.3 2 2 ω 0 ω n n + det(a ω i I) characteristic polynomial n + n + n =3 27 9 3 a 0 det(a 3I) 8 4 2 a a = det(a 2I) 2 det(a I) 0 0 0 det(a) a 3 ω 0 ω n 0 n a 0 a n 2 * 3 f(ω) =a 0 ω n + a ω n + + a n =0 2 2 ratround n 3 * 4 n =4 (5) ratround m a b f(a) f(b) 4.3.4 x j =(A λ i I) x j (8) λ i 3 3 [] 3 3 T Wilkinson A [2] 3 *3 2 *4 n =4 x =(, 0,, ) T Ax = x c 203 Information Processing Society of Japan 8

3 32 33 34 35 36 25 30 9 24 3 8 7 2 2 3 4 5 6 T =Δ Bk T Pk T A P k B k Δ A A A T A 3 T A A 3 3 4.4 4.4. l m m 5 [7] 4 4 3 2 2 i λ i λ i 0.763932... 2,3.763932... 73, 95 4 2.763932... 5,6 3. 0027, 0044 7,8,9,0 4. 9982, 0044,2 5. 999, 0000 3 5.236068... 4,5 6.236068... 863, 889 6 7.236068... 4 0 A = 4 0 0 4 0 4 4 4 2 m m 4 m 3 q * 5 CG 3 λ λ 5 λ 3 i i CG 6 λ 2 λ 7 λ 4 i MOD(i 2, 6) CG 9 * 6 CG 3 3 T k 3 2 *5 *6 28 0 c 203 Information Processing Society of Japan 9

4.4.2 3 6 * 7 2 print * 8 0. 4 4 a ii =0.4 A a a (2) (3) 0 20 i =5, 6.29999999999999988048.3000000000000005837 6 * 9 (λ m) k k * 20 Fortran C *7 R print digit 6 *8 (a + b)+c a +(b + c) *9 BASIC *20 (λ m) k m 5. Ax = λx A k...3 k 3 BASIC Fortran C c 203 Information Processing Society of Japan 20

* 2 976 Cray- 60M flop/s 202 Blue Gene/Q 6P flop/s [3] LU 000 * 22 CG (6) LDL T i, j d i l ij A. LDL Fortran [7], p. 28 Fortran 2 C++ rational void LDL(rational_matrix& a, int n){ rational s,t; *2 *22 fail } int i,j,k; for (j=; j < n; ++j) { for (i=; i < j; ++i) { s=0; for (k=0; k < i; ++k) { s = s + a[k][i] * a[k][j]; } a[i][j] = a[i][j] - s; } s=0; for (k=0; k <= j-; ++k) { t = a[k][j]/a[k][k]; s = s + t * a[k][j]; a[k][j] = t; } a[j][j] = a[j][j] - s; } [] B. Sinharoy, R. N. Kalla, and et.al., IBM Power7 Multicore Server Processor, IBM J. Res. Develop., 55(3):/ /29, 20. [2] D. Knuth, The Art of Computer Programming, Volume 2, Addison-Wesley, 969. / 98. [3] D. Knuth, The Art of Computer Programming, Volume 2, Third Edition, Addison-Wesley, 998. The Art of Computer Programming, Third Edition 2004. [4] http://hp.vector.co.jp/authors/va008683/. [5],, B,, 2008. [6] Hikaru Samukawa, Rational Number Arithmetic Including Square Root Handling, In 3th JSST Annual Conference (JSST 202) International Conference in Modeling and Simulation Technology, 202. [7],,,, HPC ITText,, 2009. [8] J. H. Wilkinson, Algebraic Eigenvalue Problem, Oxford University Press, 965. [9] Hikaru Samukawa, A Study of Rational Number Arithmetic, In 30th JSST Annual Conference (JSST 20) International Conference in Modeling and Simulation Technology, 20. [0] G. H. Golub and C. F. Van Loan, Matrix Computations third edition, Johns Hopkins University Press, 996. [] I. S. Dhillon, A New O(n 2 ) Algorithm for the Symmetric Tridiagonal Eigenvalue/Eigenvector Problem, 989. [2] J. H. Wilkinson, Rounding Errors in Algebraic Processes, Her Britannic Majesty s Stationery Office, 963,, 974. c 203 Information Processing Society of Japan 2

[3] https://www.llnl.gov/news/newsreleases/202/jun/nr- 2-06-07.html. c 203 Information Processing Society of Japan 22