線形代数演算ライブラリBLASとLAPACKの 基礎と実践1

Similar documents
線形代数演算ライブラリBLASとLAPACKの 基礎と実践1

線形代数演算ライブラリBLASとLAPACKの 基礎と実践1

_Vol16No2.indd

CMSI教育計算科学技術特論A_中田真秀

倍々精度RgemmのnVidia C2050上への実装と応用

untitled

_Vol16No3.indd

4 倍精度基本線形代数ルーチン群 QPBLAS の紹介 [index] 1. Introduction 2. Double-double algorithm 3. QPBLAS 4. QPBLAS-GPU 5. Summary 佐々成正 1, 山田進 1, 町田昌彦 1, 今村俊幸 2, 奥田洋司

14 2 Scilab Scilab GUI インタグラフ プリタ描画各種ライブラリ (LAPACK, ODEPACK, ) SciNOTES ハードウェア (CPU, GPU) 21 Scilab SciNotes 呼び出し 3 変数ブラウザ 1 ファイルブラウザ 2 コンソール 4 コマンド履歴

PC Windows 95, Windows 98, Windows NT, Windows 2000, MS-DOS, UNIX CPU

211 年ハイパフォーマンスコンピューティングと計算科学シンポジウム Computing Symposium 211 HPCS /1/18 a a 1 a 2 a 3 a a GPU Graphics Processing Unit GPU CPU GPU GPGPU G

1 (bit ) ( ) PC WS CPU IEEE754 standard ( 24bit) ( 53bit)

2008 ( 13 ) C LAPACK 2008 ( 13 )C LAPACK p. 1

/* sansu1.c */ #include <stdio.h> main() { int a, b, c; /* a, b, c */ a = 200; b = 1300; /* a 200 */ /* b 200 */ c = a + b; /* a b c */ }

Untitled

理研スーパーコンピュータ・システム

[1] #include<stdio.h> main() { printf("hello, world."); return 0; } (G1) int long int float ± ±

A11 (1993,1994) 29 A12 (1994) 29 A13 Trefethen and Bau Numerical Linear Algebra (1997) 29 A14 (1999) 30 A15 (2003) 30 A16 (2004) 30 A17 (2007) 30 A18

(Basic Theory of Information Processing) 1

ストリーミング SIMD 拡張命令2 (SSE2) を使用した SAXPY/DAXPY

漸化式のすべてのパターンを解説しましたー高校数学の達人・河見賢司のサイト

1 1.1 C 2 1 double a[ ][ ]; 1 3x x3 ( ) malloc() malloc 2 #include <stdio.h> #include

Second-semi.PDF

bash on Ubuntu on Windows bash on Ubuntu on Windows bash on Ubuntu on Windows bash on Ubuntu on Windows bash on Ubuntu on Windows ˆ Windows10 64bit Wi

C のコード例 (Z80 と同機能 ) int main(void) { int i,sum=0; for (i=1; i<=10; i++) sum=sum + i; printf ("sum=%d n",sum); 2

. UNIX, Linux, KNOPPIX. C,.,., ( 1 ) p. 2

インテル(R) Visual Fortran Composer XE

1 4 2 EP) (EP) (EP)

tuat1.dvi

untitled

comment.dvi

ex01.dvi

1.ppt

I ASCII ( ) NUL 16 DLE SP P p 1 SOH 17 DC1! 1 A Q a q STX 2 18 DC2 " 2 B R b

2 2.1 Mac OS CPU Mac OS tar zxf zpares_0.9.6.tar.gz cd zpares_0.9.6 Mac Makefile Mekefile.inc cp Makefile.inc/make.inc.gfortran.seq.macosx make

/* do-while */ #include <stdio.h> #include <math.h> int main(void) double val1, val2, arith_mean, geo_mean; printf( \n ); do printf( ); scanf( %lf, &v

第101回 日本美容外科学会誌/nbgkp‐01(大扉)

27巻3号/FUJSYU03‐107(プログラム)

パーキンソン病治療ガイドライン2002

ex01.dvi

tnbp59-20_Web:P1/ky108679509610002943

r1.dvi

新・明解C言語で学ぶアルゴリズムとデータ構造

…J…−†[†E…n…‘†[…hfi¯„^‚ΛžfiüŒå

(2 Linux Mozilla [ ] [ ] [ ] [ ] URL 2 qkc, nkc ~/.cshrc (emacs 2 set path=($path /usr/meiji/pub/linux/bin tcsh b

KBLAS[7] *1., CUBLAS.,,, Byte/flop., [13] 1 2. (AT). GPU AT,, GPU SYMV., SYMV CUDABLAS., (double, float) (cu- FloatComplex, cudoublecomplex).,, DD(dou

C言語によるアルゴリズムとデータ構造

Microsoft Word - no14.docx

Microsoft Word - C.....u.K...doc

3.2 Linux root vi(vim) vi emacs emacs 4 Linux Kernel Linux Git 4.1 Git Git Linux Linux Linus Fedora root yum install global(debian Ubuntu apt-get inst

アセンブラ入門(CASL II) 第3版

C¥×¥í¥°¥é¥ß¥ó¥° ÆþÌç

01_OpenMP_osx.indd

untitled


2 1. Ubuntu 1.1 OS OS OS ( OS ) OS ( OS ) VMware Player VMware Player jp/download/player/ URL VMware Plaeyr VMware

1 n A a 11 a 1n A =.. a m1 a mn Ax = λx (1) x n λ (eigenvalue problem) x = 0 ( x 0 ) λ A ( ) λ Ax = λx x Ax = λx y T A = λy T x Ax = λx cx ( 1) 1.1 Th

Krylov (b) x k+1 := x k + α k p k (c) r k+1 := r k α k Ap k ( := b Ax k+1 ) (d) β k := r k r k 2 2 (e) : r k 2 / r 0 2 < ε R (f) p k+1 :=

‚æ2›ñ C„¾„ê‡Ìš|

II ( ) prog8-1.c s1542h017%./prog8-1 1 => 35 Hiroshi 2 => 23 Koji 3 => 67 Satoshi 4 => 87 Junko 5 => 64 Ichiro 6 => 89 Mari 7 => 73 D


1 C STL(1) C C C libc C C C++ STL(Standard Template Library ) libc libc C++ C STL libc STL iostream Algorithm libc STL string vector l

OpenMP (1) 1, 12 1 UNIX (FUJITSU GP7000F model 900), 13 1 (COMPAQ GS320) FUJITSU VPP5000/64 1 (a) (b) 1: ( 1(a))

joho07-1.ppt

I. Backus-Naur BNF S + S S * S S x S +, *, x BNF S (parse tree) : * x + x x S * S x + S S S x x (1) * x x * x (2) * + x x x (3) + x * x + x x (4) * *

:30 12:00 I. I VI II. III. IV. a d V. VI

Transcription:

.. BLAS LAPACK 1, 2013/05/23 CMSI A 1 / 43

BLAS LAPACK (I) BLAS, LAPACK BLAS : - LAPACK : 2 / 43

: 3 / 43

(wikipedia) V : f : V u, v u + v u + v V α K u V αu V V x, y f (x + y) = f (x) + f (y) V x K α f (αx) = α f (x) :BLAS, LAPACK 3 / 43

( ) 1000 ( ; 1 2 ) :... 4 / 43

( ) 1000 ( ; 1 2 ) :... 4 / 43

( ) 1000 ( ; 1 2 ) :... 4 / 43

( ) 1000 ( ; 1 2 ) :... 4 / 43

( ) 1000 ( ; 1 2 ) :... 4 / 43

Google Page Rank Ax = λx 3D O AO ( ) ( ) U HU = diag(λ 1, λ 2, ) 5 / 43

Google Page Rank Ax = λx 3D O AO ( ) ( ) U HU = diag(λ 1, λ 2, ) 5 / 43

Google Page Rank Ax = λx 3D O AO ( ) ( ) U HU = diag(λ 1, λ 2, ) 5 / 43

Google Page Rank Ax = λx 3D O AO ( ) ( ) U HU = diag(λ 1, λ 2, ) 5 / 43

( 1 2 ) : 6 / 43

( 1 2 ) +Google trans. : 3 2 1 39 2 3 1 34 1 2 3 26 : 9 1 4, 4 1 4, 2 3 4 3 3 1 39 ( )... 7 / 43

( 1 2 ) +Google trans. : 3x + 2y + z = 39 ( ) 2x + 3y + z = 34 ( ) x + 2y + 3z = 26 ( ) ( ) ( ) ( ) 3 ( ) 2 ( ) 3 ( ) ( ) 3(2x + 3y + z = 34) 2(3x + 2y + z = 39) 5y + z = 24 ( ) 3(x + 2y + 3z = 39) 3x + 2y + z = 39 4y + 8z = 39 ( ) ( ) 5 3x + 2y + z = 39 ( ) 5y + z = 24 ( ) 20y + 40z = 195 ( ) ( )-( )x4 36z = 99 8 / 43

ENIAC ENIAC( Electronic Numerical Integrator and Computer 1946 ) 10 14 2800 357 (Wikipedia ) 9 / 43

10 / 43

: 1.01010101 2 3 10 16 1 + 0.0000000000000001 = 1 a + (b + c) (a + b) + c 11 / 43

: 1.01010101 2 3 10 16 1 + 0.0000000000000001 = 1 a + (b + c) (a + b) + c 11 / 43

: 1.01010101 2 3 10 16 1 + 0.0000000000000001 = 1 a + (b + c) (a + b) + c 11 / 43

: 1.01010101 2 3 10 16 1 + 0.0000000000000001 = 1 a + (b + c) (a + b) + c 11 / 43

: 1.01010101 2 3 10 16 1 + 0.0000000000000001 = 1 a + (b + c) (a + b) + c 11 / 43

: 754-2008 IEEE Standard for Floating-Point Arithmetic binary64 ( ) 10 16 :Core i7 920: 40GFLOPS; RADEON HD7970 1000GFLOPS, 10PFLOPS) : 12 / 43

13 / 43

... - :, 0... 14 / 43

... - :, 0... 14 / 43

... - :, 0... 14 / 43

... - :, 0... 14 / 43

? BLAS, LAPACK? 15 / 43

? BLAS, LAPACK? 15 / 43

? BLAS, LAPACK? 15 / 43

? BLAS, LAPACK? 15 / 43

? BLAS, LAPACK? 15 / 43

BLAS, LAPACK: BLAS+LAPACK ( OS ) ( ) BLAS, LAPACK! 16 / 43

BLAS, LAPACK: BLAS+LAPACK ( OS ) ( ) BLAS, LAPACK! 16 / 43

BLAS, LAPACK: BLAS+LAPACK ( OS ) ( ) BLAS, LAPACK! 16 / 43

BLAS, LAPACK: BLAS+LAPACK ( OS ) ( ) BLAS, LAPACK! 16 / 43

BLAS, LAPACK: BLAS+LAPACK ( OS ) ( ) BLAS, LAPACK! 16 / 43

BLAS, LAPACK: BLAS+LAPACK ( OS ) ( ) BLAS, LAPACK! 16 / 43

BLAS, LAPACK: BLAS+LAPACK ( OS ) ( ) BLAS, LAPACK! 16 / 43

BLAS? BLAS Basic Linear Algebra Subprograms - - - FORTRAN77 (reference BLAS) BLAS! http://www.netlib.org/blas 17 / 43

BLAS? BLAS Basic Linear Algebra Subprograms - - - FORTRAN77 (reference BLAS) BLAS! http://www.netlib.org/blas 17 / 43

BLAS? BLAS Basic Linear Algebra Subprograms - - - FORTRAN77 (reference BLAS) BLAS! http://www.netlib.org/blas 17 / 43

BLAS? BLAS Basic Linear Algebra Subprograms - - - FORTRAN77 (reference BLAS) BLAS! http://www.netlib.org/blas 17 / 43

BLAS? BLAS Basic Linear Algebra Subprograms - - - FORTRAN77 (reference BLAS) BLAS! http://www.netlib.org/blas 17 / 43

Level 1 BLAS BLAS Level 1, 2, 3 Level 1: - (+ ) (DAXPY), (DDOT) y αx + y, (1) dot x T y, (2) 15,,,, 4. 18 / 43

Level 2 BLAS BLAS Level 1, 2, 3 Level 2: - - (DGEMV) y αax + βy, (3) (DTRSV) x A 1 b, (4) 25, 4 19 / 43

Level 3 BLAS BLAS Level 1, 2, 3 Level 3 BLAS - - (DGEMM), - DSYRK, C αab + βc (5) C αaa T + βc (6) DTRSM 9 B αa 1 B (7) 20 / 43

BLAS : s, d, c, z LEVEL1 BLAS zrotg zdcal drotg drot drotm zdrot zswap dswap zdscal dscal zcopy dcopy zaxpy daxpy ddot zdotc zdotu dznrm2 dnrm2 dasum izasum idamax dzabs1 LEVEL2 BLAS zgemv dgemv zgbmv dgbmv zhemv zhbmv zhpmv dsymv dsbmv ztrmv zgemv dgemv zgbmv dgemv zhemv zhbmv zhpmv dsymv dsbmv dspmv ztrmv dtrmv ztbmv ztpmv dtpmv ztrsv dtrsv ztbsv dtbsv ztpsv dger zgeru zgerc zher zhpr zher2 zhpr2 dsyr dspr dsyr2 dspr2 LEVEL3 BLAS zgemm dgemm zsymm dsymm zhemm zsyrk dsyrk zherk zsyr2k dsyr2k zher2k ztrmm dtrmm ztrsm dtrsm 21 / 43

LAPACK? LAPACK(Linear Algebra PACKage),. BLAS. : (LU,, QR,, Schur, Schur ),, CPU OS Fortran 90 3.4.2 1600 web 1 1600! http://www.netlib.org/lapack 22 / 43

BLAS, LAPACK BLAS, LAPACK. Gaussian, Gamess, ADF, VASP CPLEX, NUOPT, GLPK.. Ruby, Python, Perl, Java, C, Mathematica, Maple, Matlab, R, octave, SciLab 23 / 43

Top 500 Top 500: Top 500, LINPACK., DGEMM -, http://www.top500.org/ 24 / 43

BLAS, LAPACK : BLAS, LAPACK Reference BLAS CPU GotoBLAS2: GotoBLAS2, BLAS, LAPACK. CPU, OS. ( ). BLAS, LAPACK 3.1.1 SandyBridge OpenBLAS: Zhang Xianyi GotoBLAS2 SandyBridge ICT Loongson-3A, 3B 25 / 43

BLAS, LAPACK : BLAS, LAPACK ATLAS:R. Clint Whaley, BLAS 2001 BLAS, BLAS % 10% GPU BLAS, LAPACK: CPU,. CPU, 10,. nvidia GPU MAGMA BLAS, LAPACK: ScaLAPACK. 26 / 43

BLAS, LAPACK :Column major or Row major 2 1 ( ) 1 2 3 A = 4 5 6 column major, row major. 1, 4, 2, 5, 3, 6 column major FORTRAN Matlab, octave column major 27 / 43

BLAS, LAPACK :Column major or Row major A = ( 1 2 ) 3 4 5 6 1, 2, 3, 4, 5, 6 row major C, C++ row major C/C++ BLAS, LAPACK major!! 28 / 43

BLAS, LAPACK :leading dimension ( LU, Chokesky LAPACK ), leading dimension BLAS, LAPACK LDA, LDB. FORTRAN A(i + j m), A(i + j lda) M N A LDA N 29 / 43

BLAS, LAPACK : 0 1? FORTRAN 1, C, C++, 0. 1 N (FORTRAN), 0 n (C,C++). x i FORTRAN X(I), C x[i 1]. A i, j FORTRAN A(I, J), C column major A[i 1 + ( j 1) lda] 30 / 43

BLAS, LAPACK : OS BLAS BLAS (CBLAS? C FORTRAN ) Fortran integer 32bit 64bit? (64bit BLAS ) GPU Linux MacOSX Windows Ubuntu 12.04 x86 31 / 43

BLAS LAPACK Ubuntu 12.04 BLAS, LAPACK C++ - 32 / 43

BLAS LAPACK Ubuntu 12.04 BLAS LAPACK $ sudo apt-get install gfortran g++ libblas-dev liblapack-dev......... $ sudo apt-get install gfortran g++ libblas-dev liblapack-dev... g++ gfortran libblas-dev liblapack-dev : 0 : 0 : 0 : 172 33 / 43

- - DGEMM. 1 8 3 9 8 3 3 3 1.2 A = 2 10 8 B = 3 11 2.3 C = 8 4 8 9 5 1 8 6 1 6 1 2 α = 3, β = 2, C αab + βc. 21 336 70.8 64 514 95 210 31 47.5 34 / 43

- :DGEMM CBLAS void F77_dgemm(const char *transa, const char *transb, int m, int n, int k, const double * alpha, const double *A, int lda, const double * B, int ldb, const double *beta, double *C, int ldc); transa, transb, transc A, B, C Row major M, N, K alpha, beta 35 / 43

- I #include <stdio.h> extern "C" { #define ADD_ #include <cblas_f77.h> } //Matlab/Octave format void printmat(int N, int M, double *A, int LDA) { double mtmp; printf("[ "); for (int i = 0; i < N; i++) { printf("[ "); for (int j = 0; j < M; j++) { mtmp = A[i + j * LDA]; printf("%5.2e", mtmp); if (j < M - 1) printf(", "); } if (i < N - 1) printf("]; "); else printf("] "); } printf("]"); } int main() { int n = 3; double alpha, beta; double *A = new double[n*n]; double *B = new double[n*n]; double *C = new double[n*n]; A[0+0*n]=1; A[0+1*n]= 8; A[0+2*n]= 3; A[1+0*n]=2; A[1+1*n]=10; A[1+2*n]= 8; A[2+0*n]=9; A[2+1*n]=-5; A[2+2*n]=-1; B[0+0*n]= 9; B[0+1*n]= 8; B[0+2*n]=3; B[1+0*n]= 3; B[1+1*n]=11; B[1+2*n]=2.3; B[2+0*n]=-8; B[2+1*n]= 6; B[2+2*n]=1; C[0+0*n]=3; C[0+1*n]=3; C[0+2*n]=1.2; C[1+0*n]=8; C[1+1*n]=4; C[1+2*n]=8; C[2+0*n]=6; C[2+1*n]=1; C[2+2*n]=-2; 36 / 43

- II printf("# dgemm demo...\n"); printf("a =");printmat(n,n,a,n);printf("\n"); printf("b =");printmat(n,n,b,n);printf("\n"); printf("c =");printmat(n,n,c,n);printf("\n"); alpha = 3.0; beta = -2.0; F77_dgemm("n", "n", &n, &n, &n, &alpha, A, &n, B, &n, &beta, C, &n); printf("alpha = %5.3e\n", alpha); printf("beta = %5.3e\n", beta); printf("ans="); printmat(n,n,c,n); printf("\n"); printf("#check by Matlab/Octave by:\n"); printf("alpha * A * B + beta * C =\n"); delete[]c; delete[]b; delete[]a; 37 / 43

- dgemm_demo.cpp $ g++ dgemm_demo.cpp -o dgemm_demo -lblas -lapack., Octave Matlab & $./dgemm_demo # dgemm demo... A =[ [ 1.00e+00, 8.00e+00, 3.00e+00]; [ 2.00e+00, 1.00e+01, 8.00e+00]; [ 9.00e+00, -5.00e+00, -1.00e+00] ] B =[ [ 9.00e+00, 8.00e+00, 3.00e+00]; [ 3.00e+00, 1.10e+01, 2.30e+00]; [ -8.00e+00, 6.00e+00, 1.00e+00] ] C =[ [ 3.00e+00, 3.00e+00, 1.20e+00]; [ 8.00e+00, 4.00e+00, 8.00e+00]; [ 6.00e+00, 1.00e+00, -2.00e+00] ] alpha = 3.000e+00 beta = -2.000e+00 ans=[ [ 2.10e+01, 3.36e+02, 7.08e+01]; [ -6.40e+01, 5.14e+02, 9.50e+01]; [ 2.10e+02, 3.10e+01, 4.75e+01] ] #check by Matlab/Octave by: alpha * A * B + beta * C 38 / 43

LAPACK : :DSYEV 3 3 1 2 3 A = 2 5 4 3 4 6 Av i = λ i v i (i = 1, 2, 3) λ 1, λ 2, λ 3 v 1, v 2, v 3 0.40973, 1.57715, 10.83258 v 1 = ( 0.914357, 0.216411, 0.342225) v 2 = (0.040122, 0.792606, 0.608413) v 3 = (0.402916, 0.570037, 0.716042) 39 / 43

DSYEV Fortran dsyev_f77(const char *jobz, const char *uplo, int *n, double *A, int *lda, double *w, double *work, int *lwork, int *info); jobz: uplo: A, lda: A leading dimension w: ( ) work, lwork: info: =0 <0: INFO=-i i >0: INFO=i 40 / 43

#include <iostream> #include <stdio.h> extern "C" int dsyev_(const char *jobz, const char *uplo, int *n, double *a, int *lda, double *w, double *work, int *lwork, int *info); //Matlab/Octave format void printmat(int N, int M, double *A, int LDA) { double mtmp; printf("[ "); for (int i = 0; i < N; i++) { printf("[ "); for (int j = 0; j < M; j++) { mtmp = A[i + j * LDA]; printf("%5.2e", mtmp); if (j < M - 1) printf(", "); } if (i < N - 1) printf("]; "); else printf("] "); } printf("]"); } int main() { int n = 3; int lwork, info; double *A = new double[n*n]; double *w = new double[n]; //setting A matrix A[0+0*n]=1;A[0+1*n]=2;A[0+2*n]=3; A[1+0*n]=2;A[1+1*n]=5;A[1+2*n]=4; A[2+0*n]=3;A[2+1*n]=4;A[2+2*n]=6; printf("a ="); printmat(n, n, A, n); printf("\n"); lwork = -1; double *work = new double[1]; dsyev_("v", "U", &n, A, &n, w, work, &lwork, &info); lwork = (int) work[0]; delete[]work; work = new double[std::max((int) 1, lwork)]; //get Eigenvalue dsyev_("v", "U", &n, A, &n, w, work, &lwork, &info); //print out some results. printf("#eigenvalues \n"); printf("w ="); printmat(n, 1, w, 1); printf("\n"); printf("#eigenvecs \n"); printf("u ="); printmat(n, n, A, n); printf("\n"); printf("#check Matlab/Octave by:\n"); printf("eig(a)\n"); printf("u *A*U\n"); delete[]work; delete[]w; delete[]a; 41 / 43

eigenvalue_demo.cpp $ g++ eigenvalue_demo.cpp -o eigenvalue_demo -lblas -lapack -lgfortran, Octave Matlab & A =[ [ 1.00e+00, 2.00e+00, 3.00e+00];\ [ 2.00e+00, 5.00e+00, 4.00e+00];\ [ 3.00e+00, 4.00e+00, 6.00e+00] ] #eigenvalues w =[ [ -4.10e-01]; [ 1.58e+00]; [ 1.08e+01] ] #eigenvecs U =[ [ -9.14e-01, 2.16e-01, 3.42e-01];\ [ 4.01e-02, -7.93e-01, 6.08e-01];\ [ 4.03e-01, 5.70e-01, 7.16e-01] ] #Check Matlab/Octave by: eig(a) U *A*U 42 / 43

BLAS, LAPACK BLAS, LAPACK BLAS, LAPACK 43 / 43