1 (bit ) ( ) PC WS CPU IEEE754 standard ( 24bit) ( 53bit)

Similar documents
Krylov (b) x k+1 := x k + α k p k (c) r k+1 := r k α k Ap k ( := b Ax k+1 ) (d) β k := r k r k 2 2 (e) : r k 2 / r 0 2 < ε R (f) p k+1 :=

Copyright c Tomonori Kouya BNCpack LGPL3

86 8 MPIBNCpack 15 : int n, myid, numprocs, i; 16 : double pi, start_x, end_x; 17 : double startwtime = 0.0, endwtime; 18 : int namelen; 19 : char pro

Copyright c Tomonori Kouya BNCpack LGPL3


115 9 MPIBNCpack 9.1 BNCpack 1CPU X = , B =

C による数値計算法入門 ( 第 2 版 ) 新装版 サンプルページ この本の定価 判型などは, 以下の URL からご覧いただけます. このサンプルページの内容は, 新装版 1 刷発行時のものです.

untitled

44 6 MPI 4 : #LIB=-lmpich -lm 5 : LIB=-lmpi -lm 7 : mpi1: mpi1.c 8 : $(CC) -o mpi1 mpi1.c $(LIB) 9 : 10 : clean: 11 : -$(DEL) mpi1 make mpi1 1 % mpiru

(Version: 2017/4/18) Intel CPU 1 Intel CPU( AMD CPU) 64bit SIMD Inline Assemler Windows Visual C++ Linux gcc 2 FPU SSE2 Intel CPU do

Bessel ( 06/11/21) Bessel 1 ( ) 1.1 0, 1,..., n n J 0 (x), J 1 (x),..., J n (x) I 0 (x), I 1 (x),..., I n (x) Miller (Miller algorithm) Bess

FFTSS Library Version 3.0 User's Guide

untitled

64bit SSE2 SSE2 FPU Visual C++ 64bit Inline Assembler 4 FPU SSE2 4.1 FPU Control Word FPU 16bit R R R IC RC(2) PC(2) R R PM UM OM ZM DM IM R: reserved

ex01.dvi

ohp1.dvi

J.JSSAC Vol. 7, No. 2, Mathematica Maple,., Open asir Open xxx asir. Open xxx Open asir, asir., Open xxx, Linux Open asir Open sm1 (kan/sm1). C

b 2

AutoTuned-RB

x h = (b a)/n [x i, x i+1 ] = [a+i h, a+ (i + 1) h] A(x i ) A(x i ) = h 2 {f(x i) + f(x i+1 ) = h {f(a + i h) + f(a + (i + 1) h), (2) 2 a b n A(x i )

Microsoft Word - Sample_CQS-Report_English_backslant.doc

OpenCV Windows(cygwin) Linux USB PC [1] Inel OpenCV OpenCV 1 Windows Linux OpenCV (a) (b)2 (c) (d) 1: OpenCV 1

ex01.dvi

fortranfunction2.qxd

Intel Memory Protection Extensions(Intel MPX) x86, x CPU skylake 2015 Intel Software Development Emulator 本資料に登場する Intel は Intel Corp. の登録

ESD-巻頭言[ ].indd

07-二村幸孝・出口大輔.indd

‚æ4›ñ

Informatics 2014

58 7 MPI 7 : main(int argc, char *argv[]) 8 : { 9 : int num_procs, myrank; 10 : double a, b; 11 : int tag = 0; 12 : MPI_Status status; 13 : 1 MPI_Init

64bit SSE2 SSE2 FPU Visual C++ 64bit Inline Assembler 4 FPU SSE2 4.1 FPU Control Word FPU 16bit R R R IC RC(2) PC(2) R R PM UM OM ZM DM IM R: reserved

Second-semi.PDF

RedHat OpenFOAM OpenFOAM ver 2.3 RedHat(RHEL)

Untitled

20 H8/3069LAN H. Fukura

min. z = 602.5x x 2 + 2

/* sansu1.c */ #include <stdio.h> main() { int a, b, c; /* a, b, c */ a = 200; b = 1300; /* a 200 */ /* b 200 */ c = a + b; /* a b c */ }

CPU Levels in the memory hierarchy Level 1 Level 2... Increasing distance from the CPU in access time Level n Size of the memory at each level 1: 2.2

LDR-MA16FU2_WM.n.[.h.E.F.A.}.j...A.._Win.p65

Informatics 2015

VNSTProductDes3.0-1_jp.pdf

14 2 Scilab Scilab GUI インタグラフ プリタ描画各種ライブラリ (LAPACK, ODEPACK, ) SciNOTES ハードウェア (CPU, GPU) 21 Scilab SciNotes 呼び出し 3 変数ブラウザ 1 ファイルブラウザ 2 コンソール 4 コマンド履歴

XMPによる並列化実装2

(300, 150) 120 getchar() HgBox(x, y, w, h) (x, y), w, h #include <stdio.h> #include <handy.h> int main(void) { int i; double w, h; } HgO

θ (t) ω cos θ(t) = ( : θ, θ. ( ) ( ) ( 5) l () θ (t) = ω sin θ(t). ω := g l.. () θ (t) θ (t)θ (t) + ω θ (t) sin θ(t) =. [ ] d dt θ (t) ω cos θ(t


DKA ( 1) 1 n i=1 α i c n 1 = 0 ( 1) 2 n i 1 <i 2 α i1 α i2 c n 2 = 0 ( 1) 3 n i 1 <i 2 <i 3 α i1 α i2 α i3 c n 3 = 0. ( 1) n 1 n i 1 <i 2 < <i

Gauss

develop

Source: Intel.Config: Pentium III Processor-Intel Seattle SE440BX-2, 128MB PC100 CL2 SDRAM Intel 440BX-2 Chipset Platform- Diamond Viper 550 /

ストリーミング SIMD 拡張命令2 (SSE2) を使用した SAXPY/DAXPY

untitled

新版明解C言語 実践編

InterSafe Personal_v2.3 ユーザーズガイド_初版


( ) a C n ( R n ) R a R C n. a C n (or R n ) a 0 2. α C( R ) a C n αa = α a 3. a, b C n a + b a + b ( ) p 8..2 (p ) a = [a a n ] T C n p n a

新版 明解C++入門編

FileMaker Server Getting Started Guide

DVD CD SoundRipper SoundRipper DVD SoundRipper DVD SoundRipper DVD CD DVD DVD DVD CD CD DVD " CD/DVD" DVD CSS DVD SoundRipper DVD-Video DVD DVD-ROM DV

EPSON EasyMP Multi PC Projection Ver.1.11 Operation Guide

/* do-while */ #include <stdio.h> #include <math.h> int main(void) double val1, val2, arith_mean, geo_mean; printf( \n ); do printf( ); scanf( %lf, &v

SQUFOF NTT Shanks SQUFOF SQUFOF Pentium III Pentium 4 SQUFOF 2.03 (Pentium 4 2.0GHz Willamette) N UBASIC 50 / 200 [

C C UNIX C ( ) 4 1 HTML 1

[1] #include<stdio.h> main() { printf("hello, world."); return 0; } (G1) int long int float ± ±

211 年ハイパフォーマンスコンピューティングと計算科学シンポジウム Computing Symposium 211 HPCS /1/18 a a 1 a 2 a 3 a a GPU Graphics Processing Unit GPU CPU GPU GPGPU G

3.1 stdio.h iostream List.2 using namespace std C printf ( ) %d %f %s %d C++ cout cout List.2 Hello World! cout << float a = 1.2f; int b = 3; cout <<

DPD Software Development Products Overview

GPU.....

mate10„”„õŒì4

2.2 Sage I 11 factor Sage Sage exit quit 1 sage : exit 2 Exiting Sage ( CPU time 0m0.06s, Wall time 2m8.71 s). 2.2 Sage Python Sage 1. Sage.sage 2. sa

untitled

表1票4.qx4

福祉行財政と福祉計画[第3版]

II ( ) prog8-1.c s1542h017%./prog8-1 1 => 35 Hiroshi 2 => 23 Koji 3 => 67 Satoshi 4 => 87 Junko 5 => 64 Ichiro 6 => 89 Mari 7 => 73 D

01_OpenMP_osx.indd

PC Link Tool PC Link Tool PC Link Tool PC Link Tool

C 2 / 21 1 y = x 1.1 lagrange.c 1 / Laglange / 2 #include <stdio.h> 3 #include <math.h> 4 int main() 5 { 6 float x[10], y[10]; 7 float xx, pn, p; 8 in

MBLAS¤ÈMLAPACK; ¿ÇÜĹÀºÅÙÈǤÎBLAS/LAPACK¤ÎºîÀ®

EPSON Easy Interactive Tools Ver.2 Operation Guide

Microsoft Word - C.....u.K...doc

C ( ) C ( ) C C C C C 1 Fortran Character*72 name Integer age Real income 3 1 C mandata mandata ( ) name age income mandata ( ) mandat

Ver.2.00

double float

Informatics 2010.key

EPSON EasyMP Multi PC Projection Ver.1.10 Operation Guide


HP xw9400 Workstation

LP-S820

卒業論文

02_C-C++_osx.indd

DVDデュプリケータ ユーザーズマニュアル

PC Windows 95, Windows 98, Windows NT, Windows 2000, MS-DOS, UNIX CPU

DPCK-US10

VM-53PA1取扱説明書

SonicStage Ver. 2.0

( CUDA CUDA CUDA CUDA ( NVIDIA CUDA I

pptx

EPSON EasyMP Multi PC Projection Ver.1.00 Operation Guide

ACDSee-Press-Release_0524

Transcription:

GNU MP BNCpack tkouya@cs.sist.ac.jp 2002 9 20 ( ) Linux Conference 2002 1

1 (bit ) ( ) PC WS CPU IEEE754 standard ( 24bit) ( 53bit) 10 2 2

3 4 5768:9:; <>= %? @BADCEGFH-I:JLKNMNOQP R )TSVU!" # %$ & " # %$ & " # %' (*) " # %+-, " # " #./ " # " # 0 1 " # %' (*) " # 2, " # 1: IEEE754 IEEE754 3

CC "!$#&%('*)+,.- /1032 46587:93; )+,.-</1032 = >? @BA@DC = >? @BA @ C E8@ EB@ 2: " #$%'& (*) +%'&!! 3: 4

PC IEEE754 5

open source software proprietary software GNU MP[6]( GMP ) GMP BNCpack[5] 6

2 (integer) (floating-point number) CPU (bit ) bit N 2 N+1 (overflow) 7

(base) (mantissa) (exponent) (round) 8

IEEE754 standard 2 IEEE754 IEEE754 32 1 8 24(23) 64 1 11 53(52) bit 1bit bit 1 1 2 p ε M ε M = 2 (p 1) (1) 0 ε M IEEE754 standard 2 23 1.1920928955078125 10 7 2 52 2.2204460492503130808472633361816 10 16 10 7, 16 ε M 1 9

2 IEEE754 standard 2 10

2.1 ( ) 3 (ill-conditioned) n Ax = b A cond(a) = A A 1 x 3 2 11

Logistic x 0 = 0.7501 x n+1 := 4x n (1 x n ) {x n } 1 x n x n x 100 = 0.078817989 IEEE754 0.515390006286616020 [2] 12

2.2 A x A = 100 99 1 99 99 1... 1 1 1, x = Conjugate-Gradient IEEE754 (float), (double) 128bit, 256bit, 512bit, 1024bit r k 2 = b Ax k 2 r 0 2 4 0 1. 99 13

b-axi 2 1 IEEE754 float 1e-05 IEEE754 double 1e-10 1e-15 gmp 128bits gmp256bits 1e-20 gmp 1024bits gmp 512bits 0 20 40 60 80 4: CG IEEE754 CG 512bit IEEE754 IEEE754 14

10 15 Pentium IV IEEE GMP 4.1 (bits) 53 128 256 512 1024 107 63 52 48 47 0.008 0.019 0.018 0.028 0.042 15

3 GNU MP GNU MP( GMP ) 1991 ANSI C 2002 9 4.1 (mpz t) (mpq t) Version 2 (mpf t) BNCpack GMP C 2048bit #include <stdio.h> #include "gmp.h" main() { int i; mpf_t x[102]; mpf_set_default_prec(2048); mpf_init_set_str(x[0], "0.7501", 10); for(i = 1; i <= 101; i++) mpf_init(x[i]); } for(i = 0; i <= 100; i++) { mpf_ui_sub(x[i+1], 1UL, x[i]); mpf_mul(x[i+1], x[i], x[i+1]); mpf_mul_ui(x[i+1], x[i+1], 4UL); if(i%10 == 0) gmp_printf("%5d %58.50Fe\n", i, x[i]); } 16

2 17

Version 4 C++ GMP Project MPFR [7] 4 C++ C #include <iostream.h> #include "gmpxx.h" main() { mpf_set_default_prec(2048); int i; mpf_class x[102]; x[0] = "0.7501"; } for(i = 0; i <= 100; i++) { x[i+1] = 4 * x[i] * (1 - x[i]); if(i%10 == 0) gmp_printf("%5d %58.50Fe\n", i, x[i].get_mpf_t()); } 4 MPFR GMP mpf t IEEE754 standard 18

GMP MPFR 1CPU PC GMP BNCpack GMP CPU Intel Pentium III 750MHz RAM 384MB OS Windows XP Professional C compiler GCC 2.95 GMP Ver.4.0.1 (./configure; make; make install ) Software MuPAD Pro 2.0, Mathematica 4.1 MPFR st = cputime(); for (i=0;i<n;i++) mpf_add(z, x, y); printf("x+y took %1.2ems\n", (double)(cputime()-st)/n); 1 2 19

CPU GMP order GMP GMP GMP Knuth[9] 20

1: IEEE754 IEEE x + y 0.000010 x y 0.000011 x y 0.000009 x/y 0.000053 x 0.00157 2: MuPAD Pro Mathematica GMP 10 100 x + y 0.0060 0.0054 0.00040 x y 0.0070 0.0092 0.00050 x y 0.011 0.011 0.0022 x/y 0.012 0.073 0.0041 x 0.036 0.097 0.0076 10 1000 x + y 0.010 0.0090 0.002 x y 0.020 0.0161 0.001 x y 0.27 0.21 0.079 x/y 0.32 0.63 0.13 x 0.29 0.90 0.085 10 10000 x + y 0.10 0.040 0.01 x y 0.10 0.07 0.01 x y 25 5.2 2.62 x/y 26 25 5.53 x 93 29 3.52 21

CPU Intel Pentium IV 2.2GHz RAM 512MB OS RedHat Linux 7.3 C compiler GCC 2.96 GMP & MPFR Ver.3.1.1, Ver.4.1 (./configure; make; make install ) 22

GMP 3.1.1, 4.1, MPFR GMP 3.1.1 GMP 4.1 GMP 4.1& MPFR 10 100 x + y 0.00023 0.00012 0.00020 x y 0.00023 0.00012 0.00036 x y 0.00184 0.00057 0.00060 x/y 0.00297 0.00168 0.00187 x 0.00692 0.00264 0.00231 10 1000 x + y 0.00080 0.00045 0.00057 x y 0.00079 0.00043 0.00097 x y 0.0815 0.0242 0.0222 x/y 0.145 0.0442 0.0426 x 0.195 0.0324 0.0316 10 10000 x + y 0.0059 0.0034 0.0040 x y 0.0058 0.0033 0.0070 x y 2.80 0.859 0.865 x/y 5.67 1.83 1.83 x 9.94 1.22 1.20 23

4 BNCpack BNCpack Basic Numerical Computation PACKage GMP Version 2 GMP GMP Fortran GMP Version 3 ANSI C C(not C++) 24

buggy 1. GMP (Ver.0.3 MPFR ) 2. (C++ Complex GSL[8] ) 3. ( ( ) ) 4. ( CG ) 5. ( ) 6. (Newton Newton, Regula-Falsi ) 7. (DKA ) 8. ( Romberg ) 9. ( Runge-Kutta ) 25

BNCpack GMP typdef MPFCmplx MPFPoly MPFArray, CMPFArray( ) MPFStack MPFVector MPFMatrix typedef struct{ unsigned long int prec; mpf_t re; mpf_t im; } mpfcmplx; typedef mpfcmplx *MPFCmplx; prec (bit ) BNCpack GMP mpf t BNCpack 26

4.1 #ifdef USE_GMP /* GMP */ /* */ MPFCmplx mpfca, mpfcb, mpfcc; /* */ /* Default: 128bit */ set_bnc_default_prec(128); /* Default */ mpfca = init_mpfcmplx(); mpfcb = init_mpfcmplx(); /* 256bit */ mpfcc = init2_mpfcmplx(256); /* mpfa := 1 + 2i */ /* mpfb := 3 + 4i */ /* mpfc := rand() + rand() i */ set_real_mpfcmplx_ui(mpfca, 1); set_image_mpfcmplx_ui(mpfca, 2); set_real_mpfcmplx_ui(mpfcb, 3); set_image_mpfcmplx_ui(mpfcb, 4); set_real_mpfcmplx_ui(mpfcc, rand()); set_image_mpfcmplx_ui(mpfcc, rand()); /* mpfcc := mpfca + mpfcb */ add_mpfcmplx(mpfcc, mpfca, mpfcb); /* */ printf("mpfca + mpfcb = "); print_mpfcmplx(mpfcc); /* */ free_mpfcmplx(mpfca); free_mpfcmplx(mpfcb); free_mpfcmplx(mpfcc); #endif 27

4.2 mpfmat a, mpfmat b #ifdef USE_GMP /* */ MPFMatrix mpfmat, mpfmat_a, mpfmat_b; mpf_t tmp, sqrt2; long int dim, row_dim, col_dim, i, j; /* Default: 128bit */ set_bnc_default_prec(128); /* */ mpfmat = init_mpfmatrix(row_dim, col_dim); mpfmat_a = init_mpfmatrix(row_dim, col_dim); mpfmat_b = init_mpfmatrix(row_dim, col_dim); mpf_init(tmp); mpf_init(sqrt2); mpf_sqrt_ui(sqrt2, 2UL); /* mpfmat[i][j] := sqrt(2) * (i+1) / (j+1) */ for(i = 0; i < row_dim; i++) { for(j = 0; j < col_dim; j++) { mpf_mul_ui(tmp, sqrt2, i+1); mpf_div_ui(tmp, tmp, j+1); set_mpfmatrix_ij(mpfmat, i, j, tmp); } } /* mpfmat_a := mpfmat */ subst_mpfmatrix(mpfmat_a, mpf_mat); /* mpfmat_b := mpfmat * mpfmat_a */ mul_mpfmatrix(mpfmat_b, mpfmat, mpfmat_a); /* */ printf("mpfmat_b:\n"); print_mpfmatrix(mpfmat_b); /* */ mpf_clear(tmp); mpf_clear(sqrt2); free_mpfmatrix(mpfmat); free_mpfmatrix(mpfmat_a); free_mpfmatrix(mpfmat_b); #endif 28

4.3 LU #ifdef USE_GMP /* Default: 256bit */ set_bnc_default_prec(256); /* */ mpf_init(reps); mpf_init(aeps); mpfa = init_mpfmatrix(dim, DIM); mpfb = init_mpfvector(dim); mpfx = init_mpfvector(dim); mpfans = init_mpfvector(dim); /* A b, x */ get_mpfproblem(mpfa, mpfb, mpfans); /* A */ print_mpfmatrix(mpfa); /* LU */ ret_mpf = MPFLUdecomp(mpfa); ret_mpf = SolveMPFLS(mpfx, mpfa, mpfb); /* */ for(i = 0; i < DIM; i++) { printf("%5ld ", i); mpf_out_str(stdout, 10, 0, \ get_mpfvector_i(mpfx, i)); printf(" "); mpf_out_str(stdout, 10, 0, \ get_mpfvector_i(mpfans, i)); printf("\n"); } /* */ mpf_clear(reps); mpf_clear(aeps); free_mpfmatrix(mpfa); free_mpfvector(mpfb); free_mpfvector(mpfx); free_mpfvector(mpfans); #endif 29

5 BNCpack BNCpack 1. ANSI C ANSI C GSL[8] C++ template 2. 3. BLAS LAPACK[10] 4. 30

5.1 BNCpack C GMP 2 C++ C++ STL 31

5.2 GSL(GNU Scientific Library)[8] BNCpack Bessel IEEE754 standard LAPACK(+BLAS) LAPACK 32

5.3 ( ) Mathematica Mathematica n := 50; digits := 100; a := Table[N[Sqrt[2]*i/j, digits], {i, 1, n}, {j, 1, n}]; b := Table[N[Sqrt[2]*i/j, digits], {i, 1, n}, {j, 1, n}]; time := Timing[a. b;]; Print[time]; Clear[a, b, time, n, digits]; 33

10 50 100 200 Mathematica 0.03 0.04 0.05 BNCpack 0.00 0.01 0.01 100 50 100 200 Mathematica 12.7 15.2 24.0 BNCpack 2.12 3.57 8.19 50 50 100 200 Mathematica 1.87 2.17 3.4 BNCpack 0.21 0.36 0.98 200 50 100 200 Mathematica 87.8 110.7 178.6 BNCpack 17.0 27.8 64.5 Mathematica 2 3 Mathematica 10 10 1000 10000 100000 Mathematica 0.25 5.79 148.2 BNCpack 0.08 2.82 97.9 Math./BNC 3.1 2.1 1.5 34

5.4 1. 2. 2 ( 3) 1 35

6 CPU Mathematica, MuPAD GMP GMP GMP BNCpack 36

BNCpack bug 1. 2. 3. ANSI C 4. BNCpack free open 37

[1],,, 1980. [2], http://www.pas-net.jp/nasoft/ [3],,, 1973. [4],,, 1985. [5] BNCpack, http://member.nifty.ne.jp/tkouya/na/bnc/ [6] GNU MP, http://swox.com/gmp/ [7] The MPFR Library, http://www.mpfr.org/ [8] The GNU Scientific Library, http://sources.redhat.com/gsl/ [9] D.E.Knuth/, /,,1986. [10] LAPACK, http://www.netlib.org/lapack/ [11] MuPAD, http://www.mupad.de/ [12] Wolfram Research, http://www.wolfram.com/ 38