線形代数演算ライブラリBLASとLAPACKの 基礎と実践1

Similar documents
線形代数演算ライブラリBLASとLAPACKの 基礎と実践1

線形代数演算ライブラリBLASとLAPACKの 基礎と実践1

_Vol16No2.indd

CMSI教育計算科学技術特論A_中田真秀

untitled

untitled

倍々精度RgemmのnVidia C2050上への実装と応用

MBLAS¤ÈMLAPACK; ¿ÇÜĹÀºÅÙÈǤÎBLAS/LAPACK¤ÎºîÀ®

_Vol16No3.indd

4 倍精度基本線形代数ルーチン群 QPBLAS の紹介 [index] 1. Introduction 2. Double-double algorithm 3. QPBLAS 4. QPBLAS-GPU 5. Summary 佐々成正 1, 山田進 1, 町田昌彦 1, 今村俊幸 2, 奥田洋司

211 年ハイパフォーマンスコンピューティングと計算科学シンポジウム Computing Symposium 211 HPCS /1/18 a a 1 a 2 a 3 a a GPU Graphics Processing Unit GPU CPU GPU GPGPU G

1 (bit ) ( ) PC WS CPU IEEE754 standard ( 24bit) ( 53bit)

Untitled

理研スーパーコンピュータ・システム

CPU Levels in the memory hierarchy Level 1 Level 2... Increasing distance from the CPU in access time Level n Size of the memory at each level 1: 2.2

PC Windows 95, Windows 98, Windows NT, Windows 2000, MS-DOS, UNIX CPU

Minsky の電力消費量の調査 (1) ディープラーニングデモプログラム (mnist) の実行デモプログラム 添付 1 CPU 実行 シングル GPU 実行 2GPU 実行での計算時間と電力消費量を比較した 〇計算時間 CPU 実行 シングル GPU 実行複数 GPU 実行 今回は 2GPU 計

14 2 Scilab Scilab GUI インタグラフ プリタ描画各種ライブラリ (LAPACK, ODEPACK, ) SciNOTES ハードウェア (CPU, GPU) 21 Scilab SciNotes 呼び出し 3 変数ブラウザ 1 ファイルブラウザ 2 コンソール 4 コマンド履歴

/* sansu1.c */ #include <stdio.h> main() { int a, b, c; /* a, b, c */ a = 200; b = 1300; /* a 200 */ /* b 200 */ c = a + b; /* a b c */ }

AutoTuned-RB

[1] #include<stdio.h> main() { printf("hello, world."); return 0; } (G1) int long int float ± ±

PowerPoint プレゼンテーション

A11 (1993,1994) 29 A12 (1994) 29 A13 Trefethen and Bau Numerical Linear Algebra (1997) 29 A14 (1999) 30 A15 (2003) 30 A16 (2004) 30 A17 (2007) 30 A18

2008 ( 13 ) C LAPACK 2008 ( 13 )C LAPACK p. 1

comment.dvi

C による数値計算法入門 ( 第 2 版 ) 新装版 サンプルページ この本の定価 判型などは, 以下の URL からご覧いただけます. このサンプルページの内容は, 新装版 1 刷発行時のものです.

supercomputer2010.ppt

1 OpenCL OpenCL 1 OpenCL GPU ( ) 1 OpenCL Compute Units Elements OpenCL OpenCL SPMD (Single-Program, Multiple-Data) SPMD OpenCL work-item work-group N

ex12.dvi

/* do-while */ #include <stdio.h> #include <math.h> int main(void) double val1, val2, arith_mean, geo_mean; printf( \n ); do printf( ); scanf( %lf, &v


漸化式のすべてのパターンを解説しましたー高校数学の達人・河見賢司のサイト

ストリーミング SIMD 拡張命令2 (SSE2) を使用した SAXPY/DAXPY

bash on Ubuntu on Windows bash on Ubuntu on Windows bash on Ubuntu on Windows bash on Ubuntu on Windows bash on Ubuntu on Windows ˆ Windows10 64bit Wi

インテル(R) Visual Fortran Composer XE

‚æ4›ñ

XMPによる並列化実装2

0530cmsi教育計算科学技術特論a_中田真秀 (nakata maho's conflicted copy) (6)

07-二村幸孝・出口大輔.indd

Microsoft Word - C.....u.K...doc

kiso2-09.key

Informatics 2014

(Basic Theory of Information Processing) 1

2 2.1 Mac OS CPU Mac OS tar zxf zpares_0.9.6.tar.gz cd zpares_0.9.6 Mac Makefile Mekefile.inc cp Makefile.inc/make.inc.gfortran.seq.macosx make

1 1.1 C 2 1 double a[ ][ ]; 1 3x x3 ( ) malloc() 2 double *a[ ]; double 1 malloc() dou

並列計算の数理とアルゴリズム サンプルページ この本の定価 判型などは, 以下の URL からご覧いただけます. このサンプルページの内容は, 初版 1 刷発行時のものです.

KBLAS[7] *1., CUBLAS.,,, Byte/flop., [13] 1 2. (AT). GPU AT,, GPU SYMV., SYMV CUDABLAS., (double, float) (cu- FloatComplex, cudoublecomplex).,, DD(dou

1 1.1 C 2 1 double a[ ][ ]; 1 3x x3 ( ) malloc() malloc 2 #include <stdio.h> #include

2 1. Ubuntu 1.1 OS OS OS ( OS ) OS ( OS ) VMware Player VMware Player jp/download/player/ URL VMware Plaeyr VMware

TSUBAME2.0 における GPU の 活用方法 東京工業大学学術国際情報センター丸山直也第 10 回 GPU コンピューティング講習会 2011 年 9 月 28 日

C 2 / 21 1 y = x 1.1 lagrange.c 1 / Laglange / 2 #include <stdio.h> 3 #include <math.h> 4 int main() 5 { 6 float x[10], y[10]; 7 float xx, pn, p; 8 in

(Version: 2017/4/18) Intel CPU 1 Intel CPU( AMD CPU) 64bit SIMD Inline Assemler Windows Visual C++ Linux gcc 2 FPU SSE2 Intel CPU do

joho09.ppt

. a, b, c, d b a ± d bc ± ad = c ac b a d c = bd ac b a d c = bc ad n m nm [2][3] BASIC [4] B BASIC [5] BASIC Intel x * IEEE a e d

untitled


ex01.dvi

ex01.dvi

QR

64bit SSE2 SSE2 FPU Visual C++ 64bit Inline Assembler 4 FPU SSE2 4.1 FPU Control Word FPU 16bit R R R IC RC(2) PC(2) R R PM UM OM ZM DM IM R: reserved

GPU n Graphics Processing Unit CG CAD

Microsoft Word - HOKUSAI_system_overview_ja.docx

Informatics 2015

GPU GPU CPU CPU CPU GPU GPU N N CPU ( ) 1 GPU CPU GPU 2D 3D CPU GPU GPU GPGPU GPGPU 2 nvidia GPU CUDA 3 GPU 3.1 GPU Core 1

( CUDA CUDA CUDA CUDA ( NVIDIA CUDA I

(2 Linux Mozilla [ ] [ ] [ ] [ ] URL 2 qkc, nkc ~/.cshrc (emacs 2 set path=($path /usr/meiji/pub/linux/bin tcsh b

numb.dvi

Informatics 2010.key

Microsoft Word - Sample_CQS-Report_English_backslant.doc

. UNIX, Linux, KNOPPIX. C,.,., ( 1 ) p. 2

8 / 0 1 i++ i 1 i-- i C !!! C 2

1 4 2 EP) (EP) (EP)


Gauss

スパコンに通じる並列プログラミングの基礎

> > <., vs. > x 2 x y = ax 2 + bx + c y = 0 2 ax 2 + bx + c = 0 y = 0 x ( x ) y = ax 2 + bx + c D = b 2 4ac (1) D > 0 x (2) D = 0 x (3

IPSJ SIG Technical Report Vol.2013-HPC-138 No /2/21 GPU CRS 1,a) 2,b) SpMV GPU CRS SpMV GPU NVIDIA Kepler CUDA5.0 Fermi GPU Kepler Kepler Tesla

Second-semi.PDF

スパコンに通じる並列プログラミングの基礎

I ASCII ( ) NUL 16 DLE SP P p 1 SOH 17 DC1! 1 A Q a q STX 2 18 DC2 " 2 B R b

電気通信大学 I 類 情報系 情報 ネットワーク工学専攻 CED 2018 システム利用ガイド ver1.2 CED 管理者 学術技師 島崎俊介 教育研究技師部 実験実習支援センター 2018 年 3 月 29 日 1 ログイン ログアウト手順について 1.1 ログイン手順 CentOS 1. モニ

¹âÀºÅÙÀþ·ÁÂå¿ô±é»»¥é¥¤¥Ö¥é¥êMPACK¤Î³«È¯

Krylov (b) x k+1 := x k + α k p k (c) r k+1 := r k α k Ap k ( := b Ax k+1 ) (d) β k := r k r k 2 2 (e) : r k 2 / r 0 2 < ε R (f) p k+1 :=

tabaicho3mukunoki.pptx

18 C ( ) hello world.c 1 #include <stdio.h> 2 3 main() 4 { 5 printf("hello World\n"); 6 } [ ] [ ] #include <stdio.h> % cc hello_world.c %./a.o

memo

C のコード例 (Z80 と同機能 ) int main(void) { int i,sum=0; for (i=1; i<=10; i++) sum=sum + i; printf ("sum=%d n",sum); 2

ad bc A A A = ad bc ( d ) b c a n A n A n A A det A A ( ) a b A = c d det A = ad bc σ {,,,, n} {,,, } {,,, } {,,, } ( ) σ = σ() = σ() = n sign σ sign(

( ) ( ) ( ) 2

C

スパコンに通じる並列プログラミングの基礎

Intel® Compilers Professional Editions

6 ( ) 1 / 53

tuat1.dvi

1 GPU GPGPU GPU CPU 2 GPU 2007 NVIDIA GPGPU CUDA[3] GPGPU CUDA GPGPU CUDA GPGPU GPU GPU GPU Graphics Processing Unit LSI LSI CPU ( ) DRAM GPU LSI GPU

C¥×¥í¥°¥é¥ß¥ó¥° ÆþÌç

01_OpenMP_osx.indd

AMD/ATI Radeon HD 5870 GPU DEGIMA LINPACK HD 5870 GPU DEGIMA LINPACK GFlops/Watt GFlops/Watt Abstract GPU Computing has lately attracted

BLAS の概要

Transcription:

1 / 56 BLAS LAPACK 1, 2017/05/25 CMSI A

2 / 56 BLAS LAPACK (I) BLAS, LAPACK BLAS : - LAPACK :

3 / 56 ( ) 1000 ( ; 1 2 ) :...

3 / 56 ( ) 1000 ( ; 1 2 ) :...

3 / 56 ( ) 1000 ( ; 1 2 ) :...

3 / 56 ( ) 1000 ( ; 1 2 ) :...

3 / 56 ( ) 1000 ( ; 1 2 ) :...

4 / 56 twitter!

convolution( ) - C = AB Google Page Rank Ax = λx, M = UΣV 3D O AO ( ) ( ) U HU = diag(λ 1, λ 2, ) 5 / 56

convolution( ) - C = AB Google Page Rank Ax = λx, M = UΣV 3D O AO ( ) ( ) U HU = diag(λ 1, λ 2, ) 5 / 56

convolution( ) - C = AB Google Page Rank Ax = λx, M = UΣV 3D O AO ( ) ( ) U HU = diag(λ 1, λ 2, ) 5 / 56

convolution( ) - C = AB Google Page Rank Ax = λx, M = UΣV 3D O AO ( ) ( ) U HU = diag(λ 1, λ 2, ) 5 / 56

convolution( ) - C = AB Google Page Rank Ax = λx, M = UΣV 3D O AO ( ) ( ) U HU = diag(λ 1, λ 2, ) 5 / 56

6 / 56 The Rhind Papyrus (the Ahmes Papyrus; BC 1650 ) x + ax = b, x + ax + bx = c

7 / 56 ( 1 2 ) Gauss

8 / 56 ( 1 2 ) :

9 / 56 ( 1 2 ) +Google trans. : 3 2 1 39 2 3 1 34 1 2 3 26 : 9 1 4, 4 1 4, 2 3 4 3 3 1 39 ( )...

( 1 2 ) +Google trans. : 3x + 2y + z = 39 ( ) 2x + 3y + z = 34 ( ) x + 2y + 3z = 26 ( ) ( ) ( ) ( ) 3 ( ) 2 ( ) 3 ( ) ( ) 3(2x + 3y + z = 34) 2(3x + 2y + z = 39) 5y + z = 24 ( ) 3(x + 2y + 3z = 39) 3x + 2y + z = 39 4y + 8z = 39 ( ) ( ) 5 3x + 2y + z = 39 ( ) 5y + z = 24 ( ) 20y + 40z = 195 ( ) ( )-( )x4 36z = 99 10 / 56

11 / 56 1693 1750 1888 1900 : = (= ) 1950 (LU )

12 / 56 ENIAC ENIAC( Electronic Numerical Integrator and Computer 1946 ) 10 14 2800 357 (Wikipedia )

13 / 56

14 / 56 2 32 64 ( ) (fraction) (exponent) a n 0 or 1 f raction {}}{ exponent k ( ) n 1 {}}{ ± 1 + a n 2 m 2 n=1 2 10 4 2 1.011 10 1.011 (2) = 1 + 0 0.5 + 1 0.25 + 1 0.125 = 1.375 (10) 10 4 2 1.101 2 5 10 1.101 2 5 = (1 + 1 0.5 + 0 0.25 + 1 0.125) 32 = 52

14 / 56 2 32 64 ( ) (fraction) (exponent) a n 0 or 1 f raction {}}{ exponent k ( ) n 1 {}}{ ± 1 + a n 2 m 2 n=1 2 10 4 2 1.011 10 1.011 (2) = 1 + 0 0.5 + 1 0.25 + 1 0.125 = 1.375 (10) 10 4 2 1.101 2 5 10 1.101 2 5 = (1 + 1 0.5 + 0 0.25 + 1 0.125) 32 = 52

14 / 56 2 32 64 ( ) (fraction) (exponent) a n 0 or 1 f raction {}}{ exponent k ( ) n 1 {}}{ ± 1 + a n 2 m 2 n=1 2 10 4 2 1.011 10 1.011 (2) = 1 + 0 0.5 + 1 0.25 + 1 0.125 = 1.375 (10) 10 4 2 1.101 2 5 10 1.101 2 5 = (1 + 1 0.5 + 0 0.25 + 1 0.125) 32 = 52

14 / 56 2 32 64 ( ) (fraction) (exponent) a n 0 or 1 f raction {}}{ exponent k ( ) n 1 {}}{ ± 1 + a n 2 m 2 n=1 2 10 4 2 1.011 10 1.011 (2) = 1 + 0 0.5 + 1 0.25 + 1 0.125 = 1.375 (10) 10 4 2 1.101 2 5 10 1.101 2 5 = (1 + 1 0.5 + 0 0.25 + 1 0.125) 32 = 52

15 / 56 : 754-2008 IEEE Standard for Floating-Point Arithmetic binary64 ( ) 10 16 1 1 =Floating point operation per second G = 10 9, T = 10 12, P = 10 15 :Core i7 (Broadwell, 10 cores, 3.5GHz): 560GFLOPS; NVIDIA TESLA P100 5.3TFLOPS, 10PFLOPS), HOKUSAI (1PFlops), (93.01PFlops)

16 / 56

17 / 56

18 / 56

19 / 56! 10 16 1 + 0.0000000000000001 = 1 a + (b + c) (a + b) + c

19 / 56! 10 16 1 + 0.0000000000000001 = 1 a + (b + c) (a + b) + c

19 / 56! 10 16 1 + 0.0000000000000001 = 1 a + (b + c) (a + b) + c

20 / 56!! a float ( (18+a) - a ) (a) 18 (b) 0 (c)

20 / 56!! a float ( (18+a) - a ) (a) 18 (b) 0 (c)

20 / 56!! a float ( (18+a) - a ) (a) 18 (b) 0 (c)

20 / 56!! a float ( (18+a) - a ) (a) 18 (b) 0 (c)

21 / 56 (c) $ cat test.c #include <math.h> #include <stdio.h> int main() { double a = 18.0; double b = pow(2,57); printf("%lf\n", (a+b) - b); } $ gcc test.c ;./a.out 32.000000 ( )

22 / 56

23 / 56... - :, 0...

23 / 56... - :, 0...

23 / 56... - :, 0...

23 / 56... - :, 0...

23 / 56... - :, 0...

23 / 56... - :, 0...

24 / 56? BLAS, LAPACK?

24 / 56? BLAS, LAPACK?

24 / 56? BLAS, LAPACK?

24 / 56? BLAS, LAPACK?

24 / 56? BLAS, LAPACK?

24 / 56? BLAS, LAPACK?

25 / 56 BLAS, LAPACK: BLAS+LAPACK ( OS ) ( )!! BLAS, LAPACK!

25 / 56 BLAS, LAPACK: BLAS+LAPACK ( OS ) ( )!! BLAS, LAPACK!

25 / 56 BLAS, LAPACK: BLAS+LAPACK ( OS ) ( )!! BLAS, LAPACK!

25 / 56 BLAS, LAPACK: BLAS+LAPACK ( OS ) ( )!! BLAS, LAPACK!

25 / 56 BLAS, LAPACK: BLAS+LAPACK ( OS ) ( )!! BLAS, LAPACK!

25 / 56 BLAS, LAPACK: BLAS+LAPACK ( OS ) ( )!! BLAS, LAPACK!

25 / 56 BLAS, LAPACK: BLAS+LAPACK ( OS ) ( )!! BLAS, LAPACK!

25 / 56 BLAS, LAPACK: BLAS+LAPACK ( OS ) ( )!! BLAS, LAPACK!

26 / 56 BLAS? BLAS Basic Linear Algebra Subprograms - - - FORTRAN77 (Reference BLAS) BLAS! http://www.netlib.org/blas

26 / 56 BLAS? BLAS Basic Linear Algebra Subprograms - - - FORTRAN77 (Reference BLAS) BLAS! http://www.netlib.org/blas

26 / 56 BLAS? BLAS Basic Linear Algebra Subprograms - - - FORTRAN77 (Reference BLAS) BLAS! http://www.netlib.org/blas

26 / 56 BLAS? BLAS Basic Linear Algebra Subprograms - - - FORTRAN77 (Reference BLAS) BLAS! http://www.netlib.org/blas

26 / 56 BLAS? BLAS Basic Linear Algebra Subprograms - - - FORTRAN77 (Reference BLAS) BLAS! http://www.netlib.org/blas

27 / 56 Level 1 BLAS BLAS Level 1, 2, 3 Level 1: - (+ ) (DAXPY), (DDOT) y αx + y, (1) dot x T y, (2) 15,,,, 4.

28 / 56 Level 2 BLAS BLAS Level 1, 2, 3 Level 2: - - (DGEMV) (DTRSV) y αax + βy, (3) x A 1 b, (4) 25, 4

29 / 56 Level 3 BLAS BLAS Level 1, 2, 3 Level 3 BLAS - - (DGEMM), - DSYRK, DTRSM 9 C αab + βc (5) C αaa T + βc (6) B αa 1 B (7)

30 / 56 BLAS : s, d, c, z LEVEL1 BLAS zrotg zdcal drotg drot drotm zdrot zswap dswap zdscal dscal zcopy dcopy zaxpy daxpy ddot zdotc zdotu dznrm2 dnrm2 dasum izasum idamax dzabs1 LEVEL2 BLAS zgemv dgemv zgbmv dgbmv zhemv zhbmv zhpmv dsymv dsbmv ztrmv zgemv dgemv zgbmv dgemv zhemv zhbmv zhpmv dsymv dsbmv dspmv ztrmv dtrmv ztbmv ztpmv dtpmv ztrsv dtrsv ztbsv dtbsv ztpsv dger zgeru zgerc zher zhpr zher2 zhpr2 dsyr dspr dsyr2 dspr2 LEVEL3 BLAS zgemm dgemm zsymm dsymm zhemm zsyrk dsyrk zherk zsyr2k dsyr2k zher2k ztrmm dtrmm ztrsm dtrsm

31 / 56 LAPACK? LAPACK(Linear Algebra PACKage),. BLAS. : (LU,, QR,, Schur, Schur ),, CPU OS Fortran 90 3.7.0 1600 web 1 4400! (2017/5/22 ; 1000?) github (https://github.com/reference-lapack) http://www.netlib.org/lapack

32 / 56 BLAS, LAPACK BLAS, LAPACK. Gaussian, Gamess, ADF, VASP CPLEX, NUOPT, GLPK.. Ruby, Python, Perl, Java, C, Mathematica, Maple, Matlab, R, octave, SciLab

33 / 56 Top 500 Top 500: Top 500, LINPACK., DGEMM -, http://www.top500.org/

BLAS, LAPACK : BLAS, LAPACK Reference BLAS CPU OpenBLAS: Zhang Xianyi GotoBLAS2 SandyBridge ARM AMD Power, ICT Loongson-3A, 3B Intel MKL: Intel BLAS LAPACK 2012 Intel Intel OpenBLAS GotoBLAS2 ATLAS:R. Clint Whaley, BLAS 2001 BLAS, BLAS % 10% GPU BLAS, LAPACK: GPU CPU 1W 10,. MAGMA CUDA, Xeon Phi OpenCL GPU BLAS, LAPACK NVIDIA cublas 34 / 56

35 / 56 BLAS, LAPACK : CPU OS BLAS? C? FORTRAN? Python? Ruby? 32bit OS 64bit OS? GPU Linux MacOSX Windows Ubuntu 16.04 x86

36 / 56 BLAS LAPACK Ubuntu 16.04 BLAS, LAPACK C++ -

BLAS LAPACK Ubuntu 16.04 BLAS LAPACK $ sudo apt-get install gfortran g++ libblas-dev liblapack-dev......... $ sudo apt-get install gfortran g++ libblas-dev liblapack-dev... g++ gfortran libblas-dev liblapack-dev : 0 : 0 : 0 : 172 37 / 56

38 / 56 - - DGEMM. 1 8 3 9 8 3 3 3 1.2 A = 2 10 8 B = 3 11 2.3 C = 8 4 8 9 5 1 8 6 1 6 1 2 α = 3, β = 2, C αab + βc. 21 336 70.8 64 514 95 210 31 47.5

39 / 56 - :DGEMM CBLAS BLAS void F77_dgemm(const char *transa, const char *transb, int m, int n, int k, const double * alpha, const double *A, int lda, const double * B, int ldb, const double *beta, double *C, int ldc); transa, transb, transc A, B, C Row major M, N, K alpha, beta

40 / 56 - I #include <stdio.h> extern "C" { #define ADD_ #include <cblas_f77.h> } //Matlab/Octave format void printmat(int N, int M, double *A, int LDA) { double mtmp; printf("[ "); for (int i = 0; i < N; i++) { printf("[ "); for (int j = 0; j < M; j++) { mtmp = A[i + j * LDA]; printf("%5.2e", mtmp); if (j < M - 1) printf(", "); } if (i < N - 1) printf("]; "); else printf("] "); } printf("]"); } int main() { int n = 3; double alpha, beta; double *A = new double[n*n]; double *B = new double[n*n]; double *C = new double[n*n]; A[0+0*n]=1; A[0+1*n]= 8; A[0+2*n]= 3; A[1+0*n]=2; A[1+1*n]=10; A[1+2*n]= 8; A[2+0*n]=9; A[2+1*n]=-5; A[2+2*n]=-1; B[0+0*n]= 9; B[0+1*n]= 8; B[0+2*n]=3; B[1+0*n]= 3; B[1+1*n]=11; B[1+2*n]=2.3; B[2+0*n]=-8; B[2+1*n]= 6; B[2+2*n]=1; C[0+0*n]=3; C[0+1*n]=3; C[0+2*n]=1.2; C[1+0*n]=8; C[1+1*n]=4; C[1+2*n]=8; C[2+0*n]=6; C[2+1*n]=1; C[2+2*n]=-2;

41 / 56 - II printf("# dgemm demo...\n"); printf("a =");printmat(n,n,a,n);printf("\n"); printf("b =");printmat(n,n,b,n);printf("\n"); printf("c =");printmat(n,n,c,n);printf("\n"); alpha = 3.0; beta = -2.0; F77_dgemm("n", "n", &n, &n, &n, &alpha, A, &n, B, &n, &beta, C, &n); printf("alpha = %5.3e\n", alpha); printf("beta = %5.3e\n", beta); printf("ans="); printmat(n,n,c,n); printf("\n"); printf("#check by Matlab/Octave by:\n"); printf("alpha * A * B + beta * C =\n"); delete[]c; delete[]b; delete[]a;

42 / 56 - dgemm_demo.cpp $ g++ dgemm_demo.cpp -o dgemm_demo -lblas -lapack., Octave Matlab & $./dgemm_demo # dgemm demo... A =[ [ 1.00e+00, 8.00e+00, 3.00e+00]; [ 2.00e+00, 1.00e+01, 8.00e+00]; [ 9.00e+00, -5.00e+00, -1.00e+00] ] B =[ [ 9.00e+00, 8.00e+00, 3.00e+00]; [ 3.00e+00, 1.10e+01, 2.30e+00]; [ -8.00e+00, 6.00e+00, 1.00e+00] ] C =[ [ 3.00e+00, 3.00e+00, 1.20e+00]; [ 8.00e+00, 4.00e+00, 8.00e+00]; [ 6.00e+00, 1.00e+00, -2.00e+00] ] alpha = 3.000e+00 beta = -2.000e+00 ans=[ [ 2.10e+01, 3.36e+02, 7.08e+01]; [ -6.40e+01, 5.14e+02, 9.50e+01]; [ 2.10e+02, 3.10e+01, 4.75e+01] ] #check by Matlab/Octave by: alpha * A * B + beta * C

43 / 56 - DGEMM C αab + βc void F77_dgemm(const char *transa, const char *transb, int m, int n, int k, const double * alpha, const double *A, int lda, const double * B, int ldb, const double *beta, double *C, int ldc);? 1 m, n, k 2 lda, ldb, ldc? 3 α, β 1, 0?

44 / 56 - DGEMM :Leading dimension (= )

- DGEMM :Leading dimension 45 / 56

46 / 56 - DGEMM :Leading dimension ( ), leading dimension LDA, LDB A(i,j) A leading dimension A(i + j lda) M N A LDA N

47 / 56 - DGEMM :Leading dimension A A? A (p, q) n, m leading dimension

LAPACK : :DSYEV 3 3 1 2 3 A = 2 5 4 3 4 6 Av i = λ i v i (i = 1, 2, 3) λ 1, λ 2, λ 3 v 1, v 2, v 3 0.40973, 1.57715, 10.83258 v 1 = ( 0.914357, 0.216411, 0.342225) v 2 = (0.040122, 0.792606, 0.608413) v 3 = (0.402916, 0.570037, 0.716042) 48 / 56

49 / 56 DSYEV Fortran dsyev_f77(const char *jobz, const char *uplo, int *n, double *A, int *lda, double *w, double *work, int *lwork, int *info); jobz: uplo: A, lda: A leading dimension w: ( ) work, lwork: info: =0 <0: INFO=-i i >0: INFO=i

#include <iostream> #include <stdio.h> extern "C" int dsyev_(const char *jobz, const char *uplo, int *n, double *a, int *lda, double *w, double *work, int *lwork, int *info); //Matlab/Octave format void printmat(int N, int M, double *A, int LDA) { double mtmp; printf("[ "); for (int i = 0; i < N; i++) { printf("[ "); for (int j = 0; j < M; j++) { mtmp = A[i + j * LDA]; printf("%5.2e", mtmp); if (j < M - 1) printf(", "); } if (i < N - 1) printf("]; "); else printf("] "); } printf("]"); } int main() { int n = 3; int lwork, info; double *A = new double[n*n]; double *w = new double[n]; //setting A matrix A[0+0*n]=1;A[0+1*n]=2;A[0+2*n]=3; A[1+0*n]=2;A[1+1*n]=5;A[1+2*n]=4; A[2+0*n]=3;A[2+1*n]=4;A[2+2*n]=6; printf("a ="); printmat(n, n, A, n); printf("\n"); lwork = -1; double *work = new double[1]; dsyev_("v", "U", &n, A, &n, w, work, &lwork, &info); lwork = (int) work[0]; delete[]work; work = new double[std::max((int) 1, lwork)]; //get Eigenvalue dsyev_("v", "U", &n, A, &n, w, work, &lwork, &info); //print out some results. printf("#eigenvalues \n"); printf("w ="); printmat(n, 1, w, 1); printf("\n"); printf("#eigenvecs \n"); printf("u ="); printmat(n, n, A, n); printf("\n"); printf("#check Matlab/Octave by:\n"); printf("eig(a)\n"); printf("u *A*U\n"); delete[]work; delete[]w; delete[]a; 50 / 56

51 / 56 eigenvalue_demo.cpp $ g++ eigenvalue_demo.cpp -o eigenvalue_demo -lblas -lapack -lgfortran, Octave Matlab & A =[ [ 1.00e+00, 2.00e+00, 3.00e+00];\ [ 2.00e+00, 5.00e+00, 4.00e+00];\ [ 3.00e+00, 4.00e+00, 6.00e+00] ] #eigenvalues w =[ [ -4.10e-01]; [ 1.58e+00]; [ 1.08e+01] ] #eigenvecs U =[ [ -9.14e-01, 2.16e-01, 3.42e-01];\ [ 4.01e-02, -7.93e-01, 6.08e-01];\ [ 4.03e-01, 5.70e-01, 7.16e-01] ] #Check Matlab/Octave by: eig(a) U *A*U

52 / 56 BLAS, LAPACK :Column major or Row major 2 1 ( ) 1 2 3 A = 4 5 6 column major, row major. 1, 4, 2, 5, 3, 6 column major

53 / 56 BLAS, LAPACK :Column major or Row major ( ) 1 2 3 A = 4 5 6 1, 2, 3, 4, 5, 6 row major C, C++ row major

54 / 56 BLAS, LAPACK : 0 1? FORTRAN 1, C, C++, 0. 1 N (FORTRAN), 0 n (C,C++). x i FORTRAN X(I), C x[i 1]. A i, j FORTRAN A(I, J), C column major A[i 1 + ( j 1) lda]

55 / 56 BLAS, LAPACK BLAS, LAPACK C / BLAS, LAPACK

56 / 56 BLAS, LAPACK 1 ( ) BLAS, LAPACK 2 (GPU ) http://nakatamaho.riken.jp/blas_lapack_tutorial_part1.pdf http://nakatamaho.riken.jp/blas_lapack_tutorial_part2.pdf LAPACK/BLAS (https://www.morikita.co.jp/books/book/2226) Matrix Computations Gene H. Golub and Charles F. Van Loan (http://web.mit.edu/ehliu/public/sclark/golub%20g.h.,%20van%20loan%20c.f.- %20Matrix%20Computations.pdf) Accuracy and Stablity of NumericalAlgorithms, Nicholas J. Higham