インテル(R) アーキテクチャ (IA) 浮動小数点ユニット (FPU)、ストリーミング SIMD 拡張命令 (SSE)、ストリーミング SIMD 拡張命令2 (SSE2) を使用した浮動小数点算術演算
|
|
|
- ふじきみ りゅうとう
- 6 years ago
- Views:
Transcription
1 (IA) (FPU) SIMD (SSE) SIMD 2(SSE2) : J /12/06 1
2 Fax: * Copyright Intel Corporation 1999, /12/06 2
3 IA FPU FPU FPU NaN FPU SIMD SIMD SIMD / NaN SIMD SSE SIMD SIMD SIMD / NaN SSE SSE /12/06 3
4 2.0 Pentium [1] IEEE Standard for Binary Floating-Point Arithmetic ANSI/IEEE Std [2] 1999 [3] Visual C On-Line Manual Microsoft Corporation 1999 (FPU) (x87 ) SIMD (SSE) SIMD 2(SSE2) x87 2 IEEE 01/12/06 4
5 1. (IA) Pentium III IA 3D / SSE SSE2 SIMD(Single Instruction, Multiple Data) SSE 3D SSE MMX SSE SSE 64 SIMD SSE2 SIMD SIMD IA-32 SIMD SSE2 128 SIMD 64 MMX x87 SSE SSE2 2 IA (FPU) NaN FPU ( ) IA-32 FPU IEEE [1] FPU IA FPU / SSE SSE2 ( ) ( ) FPU 3 SSE / NaN SSE 2 4 SSE2 3 5 FPU SSE SSE2 SSE SSE2 FPU 01/12/06 5
6 2. IA FPU 0( ) 1( ) [E min, E max ] [1, 2] 2 ( J = 1 ) ( ) f = (-1) 2 ulp(unit-in-the-last-place) ulp 1 ulp = = 2 N + 1 N IA FPU 3 ( ) ( 2 IEEE [1]) 1 FPU IA FPU 1: IA-32 FP IA-32 FP ( ( IA-32 IA-32 ) ) (40 80( ) 0) ( ) E min E max ( ) ( ) ( ) ( ) ( ) ( ) ( ) 01/12/06 6
7 ( ) : 0( E min ) 0 : : NaN(Not a Number): ( ) NaN 0 NaN NaN(SNaN) ) 1 NaN NaN(QNaN) QNaN QNaN ( =1 =11 1 =110 0) ( - ) FPU NaN FPU J = FPU IA FPU FPU 8 80 ( BCD ) FPU ( ) 2 FPU TOP ST(0) ST(1) ST(2) ST(7) ST(0) ST(0) ST(1) ST(1) ST(2) ( ) 8 ST(0) ST(1) ST(0) ST(2) ST(1) ( ) FXCH 01/12/06 7
8 FPU 8 FPU 1 1 FPU 0 1 (FPU 11 ) FPU (FPU 48 ) FPU ( ) (FPU 48 ) FPU ( ) ( ) MMX MMX FPU MMX FPU EMMS EMMS FPU ( 1) MMX FPU MMX MMX FPU ( 0) TOP 0 TOP 0 0 FPU FPU MMX FPU 2.2 FPU 1 16 FPU 0 5(IM DM ZM OM UM PM) FPU ( ) 8 9 (PC) FPU PC=00B 24 PC=10B 53 PC=11B 64 PC=01B ( IA ) PC (RC) 01/12/06 8
9 IEEE [1] RC=00B RC=01B RC=10B RC=11B 12(X) ( ) ([1] 7.4 ) ( E max ) ( 0 E min ) ( ) (FPMAX = Emax ) ( 0 E min ) (FPMIN = Emin ) 0 X RC PC PM UM OM ZM DM IM : FPU 2 16 FPU FPU 0 5(IE DE ZE OE UE PE) 1 0(IE) 7(SF) ( ) ( ) (C1 = 0) (C1 = 1) 9(C1) 7(ES) (C0 C1 C2 C3) (C0 C2 C3 ) PE C1 = 1 [2] TOP 14(B) FPU 01/12/06 9
10 B C3 TOP C2 C1 C0 ES SF PE UE OE ZE DE IE : FPU 1: [1] 1 2 a b E min = -126 a = b = a b = ( ) ( ) = ( ) a b = ( 24 ) a b = ( ) a b = ( ) a b = ( ) a b = ( ) ( 24 ) ( 24 ) a b = (P ) a b = (P U ) a b = (P ) a b = (P U ) FPU (#I)(#IS - #IA - ) (#D) (#Z) (#O) 01/12/06 10
11 (#U) ( )(#P) 6 FPU / ( ) FPU ( ) ( ) ( ) FPU ( ) FPU (#I #D #Z) ( ) FPU SNaN ( QNaN) ( ) ( ) / 0 / 0 ( ) (IEEE [1] ) 0 0 FPU FPU FPU PC ( ) 15 ( ) MAXFP FPU ( ) FPU (2.7 8 ) FPU ( ) FPU 01/12/06 11
12 C1 C1 (C0 C3 ) (2.7 9 FPU FPU FPU PC ( ) 15 FPU ( ) FPU FPU ( ) FPU C1 C1 FPU / SNaN ( QNaN) QNaN ( QNaN ) ( ) FPU WAIT/FWAIT ( ) 01/12/06 12
13 ( WAIT/FWAIT ) 2: 1 2 a b a = b = ( FMUL FST 2 5 ) 24 ( 1 ) FMUL (IA-32 ) IA a b = a b = a b = a b = FST (32 ) FST P U P ( ) ( ( ) FPU FPU 01/12/06 13
14 ( ) / / FPU FPU / / FPU ( ) ( ) 2.4 ( ) ( ) (MS-DOS ) 2 CR0.NE CR0 NE (CR0.NE=1) ( FPU WAIT ) MMX 16(#MF) MMX ( ) (MS-DOS ) CR0 NE (CR0.NE=0) CPU FERR# (CR0.NE=1 ) FERR# MMX Inteli486 TM FERR# IGNNE# MMX IGNNE# 01/12/06 14
15 (PIC) ( )INTR# 2 )#NMI MMX CPU MMX FPU FPU ( ) FPU 2.5 NaN QNaN( NaN) ( ) QNaN QNaN/0.0 QNaN 2 FPU QNaN SNaN QNaN FPU SNaN SNaN 1 FPU SNaN FPU FRSTOR FPU (8 ) SNaN FRSTOR SNaN FPU SNaN FPU SNaN QNaN NaN NaN NaN NaN 2 (2 NaN 0 NaN ) Pentium Pro IA ( ) 01/12/06 15
16 2: FPU QNaN SNaN QNaN 2 SNaN 2 QNaN SNaN QNaN QNaN QNaN SNaN QNaN( NaN) SNaN QNaN QNaN QNaN SNaN QNaN( NaN) QNaN NaN QNaN 2.6 FPU FPU FPU 2.7 SSE SSE2 FPU FPU FPU FPU ( )6 FPU [2] 1. FLD: floating-point load FPU 80 FPU ST(0) : I D FST/FSTP: floating-point store - ST(0) FSTP 80 : I O U P FXCH: ST(i) ST(0) : 01/12/06 16
17 FCMOVcc: EFLAG CF ZF PF ST(i) ST(0) : FILD: FPU ST(0) : FIST/FISTP: ST(0) FISTP 64 : I P FBLD: 80 BCD FPU : FBSTP: FPU ST(0) 80 BCD ST(0) : I P 2. FLDZ FLD1 FLDPI FLDL2T FLDL2E FLDLG2 FLDLN2: log 2 10 log 2 e log 10 2 log e 2 ST(0) : 3. FADD/FADDP: floating-point add ST(0) ( 1 ) FADDP : I D O U P FIADD: FPU ST(0) : I D O U P FSUB/FSUBP/FSUBR/FSUBRP: floating-point subtract FSUB/FSUBP FADD/FADDP ( ST(0) 1 ) FSUBR/FSUBRP FSUB/FSUBP : I D O U P FISUB/FISUBR: subtract integer (converted to double-extended format) from floating-point FIADD ( ST(0) 1 ) FISUBR FISUB : I D O U P 01/12/06 17
18 FMUL/FMULP: floating-point multiply FADD/FADDP : I D O U P FIMUL: multiply floating-point and integer (converted to double-extended format) FIADD : I D O U P FDIV/FDIVP/FDIVR/FDIVRP: floating-point divide FDIV/FDIVP FADD/FADDP ( ST(0) ) FDIVR/FDIVRP FDIV/FDIVP : I D Z O U P FIDIV/FIDIVR: divide floating-point to integer (converted to double-extended format) FIADD ( ST(0) ) FIDIVR FIDIV : I D Z O U P FSQRT: : I D P FRNDINT: FPU : I D O U P FABS: : FCHS: ST(0) : FPREM: partial remainder ST(0) ST(1) ST(0) ( ) : I D U FPREM1: IEEE partial remainder ST(0) ST(1) IEEE [2] ST(0) ( ) : I D U FXTRACT: ST(0) ( 0x3fff ) : I D Z 4. FCOM/FCOMP/FCOMPP: compare real - FPU FPU FCOMP ST(0) FCOMPP FPU 2 FPU C3 C2 C0 QNaN : I D 01/12/06 18
19 FUCOM/FUCOMP/FUCOMPP: unordered compare real FCOM/FCOMP/FCOMPP QNaN : I D FICOM/FICOMP: FPU FICOMP ST(0) FPU C3 C2 C0 QNaN : I D FCOMI/FCOMIP: FPU FPU EFLAGS FCOMIP ST(0) QNaN : I FUCOMI/FUCOMIP: FCOMI/FCOMIP QNaN : I FTST: ST(0) 0.0 FPU C3 C2 C0 : I D FXAM: ST(0) NaN 0 FPU C3 C2 C0 : 5. FSIN: ST(0) ST(0) : I D U P FCOS: ST(0) ST(0) : I D P(U ) FSINCOS: ST(0) ST(0) FPU : I D U P FPTAN: tangent - ST(0) tan(st(0)) FPU 1.0 ( 2 63 ) : I D U P FPATAN: arctangent - ST(1) arctan(st(1)/st(0)) ST(0) : I D U P 66 ( ) 01/12/06 19
20 6. FYL2X: ST(1) ST(1) * log 2 ST(0) ST(0) : I D Z O U P FYL2XP1: ST(1) ST(1) * log 2 (ST(0) + 1.0) ST(0) : I D O U P F2XM1: ST(0) 2 ST(0) 1 : I D U P FSCALE: ST(0) ST(1) : I D O U P 7. FPU ( ) FINIT/FNINIT: (FINIT) (FNINIT) 64 FPU FLDCW: 2 FPU FPU FPU FSTCW/FNSTCW: (FSTCW) (FNSTCW) FPU 2 FSTSW/FNSTSW: (FSTSW) (FNSTSW) FPU 2 AX FCLEX/FNCLEX: (FCLEX) (FNCLEX) FLDENV: ( )14 28 FPU 1 FPU FSTENV/FNSTENV: (FSTENV) (FNSTENV) ( )14 28 FPU FRSTOR: ( ) FPU FPU FPU 01/12/06 20
21 FSAVE/FNSAVE: (FSAVE) (FNSAVE) ( ) FPU FPU FINCSTP: FPU TOP ( ) FDECSTP: FPU TOP ( ) FFREE: ST(i) FNOP: FWAIT/WAIT: FPU FNINIT FNSTENV FNSAVE FNSTSW FNSTCW FNCLEX FNSTSW FNSTCW FPU FNSTSW FNSTCW 2.7 ( FPU ) C([3] ) FPU IA-32 mov IA-32 DWORD PTR 32 TBYTE PTR 80 IEEE [1] 16 ( 10 ) 0x (0) 8 ( ) 24 ( ) = x * /12/06 21
22 3: IEEE [1] fpexpr res if (fexpr == res) printf ( SUCCESS\n ); else printf ( FAIL\n ); eps if (-eps < fexpr res && fexpr res < eps) printf ( SUCCESS\n ); else printf ( FAIL\n ); x x ( x) rn x x ( ) (( x) rn * ( x) rn ) rn = x #include <stdio.h> void main () { float x, y, z; char *px, *py; int i; unsigned short cw, *pcw; // control word and pointer to it pcw = &cw; // set control word cw = 0x003f; // round to nearest, 24 bits, floating-point exc. disabled // cw = 0x043f; // round down, 24 bits, floating-point exc. disabled // cw = 0x083f; // round up, 24 bits, floating-point exc. disabled // cw = 0x0c3f; // round to zero, 24 bits, floating-point exc. disabled mov eax, DWORD PTR pcw fldcw [eax] for (i = 0 ; i < 11 ; i++) { x = (float)i; // x = 1.0, 2.0,..., 10.0 // compute y = sqrt (x) px = (char *)&x; py = (char *)&y; mov eax, DWORD PTR px fld DWORD PTR [eax] fsqrt mov eax, DWORD PTR py fstp DWORD PTR [eax] 01/12/06 22
23 z = y * y; printf ("x = %f = 0x%x\n", x, *(int *)&x); printf ("y = %f = 0x%x\n", y, *(int *)&y); printf ("z = %f = 0x%x\n", z, *(int *)&z); if (z == x) printf ("EQUAL\n\n"); else printf ("NOT EQUAL\n\n"); x x z x = x = x 4: 1 #include <stdio.h> void main () { float a, b, c; // single precision numbers (of size 4 bytes) unsigned int u; // unsigned integer (of size 4 bytes) char *pa, *pb, *pc; // pointers to single precision numbers unsigned short sw, *psw; // status word and pointer to it unsigned short cw, *pcw; // control word and pointer to it // will compute c = a * b psw = &sw; pcw = &cw; // clear and read status word, set control word cw = 0x033f; // round to nearest, 64 bits, fp exc.disabled // cw = 0x073f; // round down, 64 bits, fp exc.disabled // cw = 0x0b3f; // round up, 64 bits, fp exc.disabled // cw = 0x0f3f; // round to zero, 64 bits, fp exc. disabled fclex mov eax, DWORD PTR pcw fldcw [eax] mov eax, DWORD PTR psw fstsw [eax] printf ("BEFORE COMPUTATION sw = %4.4x\n", sw); pa = (char *)&a; u = 0x00fffffe; a = *(float *)&u; // a = * 2^-126 pb = (char *)&b; u = 0x3f000001; b = *(float *)&u; // b = * 2^-1 pc = (char *)&c; // compute c = a * b mov eax, DWORD PTR pa; fld DWORD PTR [eax]; // push a on the FPU stack mov eax, DWORD PTR pb; 01/12/06 23
24 fld DWORD PTR [eax]; // push b on the FPU stack fmulp st(1), st(0); // a * b in st(1), pop st(0) mov eax, DWORD PTR pc; fstp DWORD PTR [eax]; // c = a * b from FPU stack to memory, pop st(0) mov eax, DWORD PTR psw fstsw [eax] printf ("AFTER COMPUTATION sw = %4.4x\n", sw); printf ("c = %8.8x = %f\n", *(unsigned int *)&c, c); 1.0 * ( ) BEFORE COMPUTATION sw = 0000 AFTER COMPUTATION sw = 0220 c = = * ( ) BEFORE COMPUTATION sw = 0000 AFTER COMPUTATION sw = 0030 c = 007fffff = : FPU IEEE x87 IEEE [1] 2 IEEE IEEE FPU IEEE IEEE IEEE ( 8 15 ) ( FPU IEEE 24 ) FPU IEEE FPU d = (a * b) / c (a = 1.0 * b = 1.0 * c = 1.0 * ) a * b = 1.0 * IEEE FPU a * b = 1.0 * FPU ( 15 ) d = (a * b) / c = 1.0 * FPU 2 IEEE ( IEEE ) 01/12/06 24
25 ( ) FPU 64 fst ( 6 ) 53 FPU #include <stdio.h> void main () { float a, b, c, d; // single precision floating-point numbers unsigned int u; // unsigned integer (of size 4 bytes) char *pa, *pb, *pc, *pd; // pointers to single precision numbers unsigned short sw, *psw; // status word and pointer to it // will compute d = (a * b) / c psw = &sw; // clear and read status word; set rounding to nearest, // and 64-bit precision finit mov eax, DWORD PTR psw fstsw [eax] printf ("BEFORE COMP. sw = %4.4x\n", sw); pa = (char *)&a; u = 0x ; a = *(float *)&u; // a = 1.0 * 2^115 pb = (char *)&b; u = 0x7e000000; b = *(float *)&u; // b = 1.0 * 2^125 pc = (char *)&c; u = 0x7b800000; c = *(float *)&u; // c = 1.0 * 2^120 pd = (char *)&d; // compute d = (a * b) / c holding the intermediate result // a * b = 2^240 on the FPU stack mov eax, DWORD PTR pa; fld DWORD PTR [eax]; // push a on the FPU stack mov eax, DWORD PTR pb; fld DWORD PTR [eax]; // push b on the FPU stack fmulp st(1), st(0); // a * b = 2^240 in st(1), pop st(0) mov eax, DWORD PTR pc; fld DWORD PTR [eax]; // push c on the FPU stack fdivp st(1), st(0) // st(1) / st(0) = 2^120 in st(1), pop st(0) mov eax, DWORD PTR pd; fstp DWORD PTR [eax]; // d = 2^120 from FPU stack to mem., pop st(0) // read status word mov eax, DWORD PTR psw fstsw [eax] printf ("AFTER FIRST COMP. sw = %4.4x\n", sw); printf ("d = %8.8x = %f\n", *(unsigned int *)&d, d); // d = 2^120 // compute d = (a * b) / c saving the intermediate result // a * b = 2^240 to memory // round to nearest, 64-bit precision, floating-point exc. disabled 01/12/06 25
26 fclex mov eax, DWORD PTR pa; fld DWORD PTR [eax]; // push a on the FPU stack mov eax, DWORD PTR pb; fld DWORD PTR [eax]; // push b on the FPU stack fmulp st(1), st(0); // a * b = 2^240 in st(1), pop st(0) mov eax, DWORD PTR pd; fstp DWORD PTR [eax]; // d = a * b from the FPU stack to mem, pop st(0) fld DWORD PTR [eax]; // push d = +Inf from memory on the FPU stack mov eax, DWORD PTR pc; fld DWORD PTR [eax]; // push c on the FPU stack fdivp st(1), st(0) // st(1) / st(0) = +Inf in st(1), pop st(0) mov eax, DWORD PTR pd; fstp DWORD PTR [eax]; // d = +Inf from the FPU stack to mem, pop st(0) // read status word mov eax, DWORD PTR psw fstsw [eax] printf ("AFTER SECOND COMP. sw = %4.4x\n", sw); printf ("d = %8.8x = %f\n", *(unsigned int *)&d, d); 1 (FPU ) ( ) IEEE ( ) AFTER FIRST COMP. sw = 0000 d=7b800000= AFTER SECOND COMP. sw = 0028 d = 7f = 1.#INF00 6: R R rn53 rn64 64 ((R) rn64 ) rn53 = (R) rn53 R ( ) ( 64 ) ( 53 ) * ( 24 ) 2 1 ( 24 ) FPU ( 15 ) ( 24 8 ) ulp 01/12/06 26
27 2 FPU ( ) ( 24 8 ) #include <stdio.h> void main () { float a, b, c; // single precision floating-point numbers unsigned int u; // unsigned integer (of size 4 bytes) char *pa, *pb, *pc; // pointers to single precision numbers unsigned short sw, *psw; // status word and pointer to it unsigned short cw, *pcw; // control word and pointer to it // will compute c = a * b psw = &sw; pcw = &cw; // clear status flags, read status word, set control word cw = 0x003f; // round to nearest, 24 bits, fp exc. disabled fclex mov eax, DWORD PTR pcw fldcw [eax] mov eax, DWORD PTR psw fstsw [eax] printf ("BEFORE FIRST COMP. sw = %4.4x\n", sw); pa = (char *)&a; u = 0x ; a = *(float *)&u; // a = * 2^-126 pb = (char *)&b; u = 0x3f080000; b = *(float *)&u; // b = * 2^-1 pc = (char *)&c; c = 123.0; // initialize c to random value // compute c = a * b with 24 bits of precision; // result a * b with `unbounded' exponent on FPU stack mov eax, DWORD PTR pa fld DWORD PTR [eax] // push a on the FPU stack mov eax, DWORD PTR pb fld DWORD PTR [eax] // push b on the FPU stack fmulp st(1), st(0) // a * b in st(1), pop st(0) mov eax, DWORD PTR pc fstp DWORD PTR [eax] // c = a * b from FPU stack to memory, pop st(0) // read status word mov eax, DWORD PTR psw fstsw [eax] printf ("AFTER FIRST COMP. sw = %4.4x\n", sw); printf ("AFTER FIRST COMP. c = %8.8x = %f\n", *(unsigned int *)&c, c); // c = * 2^-126 // clear status flags, read status word, set control word cw = 0x023f; // round to nearest, 53 bits, fp exc. disabled 01/12/06 27
28 fclex mov eax, DWORD PTR pcw fldcw [eax] mov eax, DWORD PTR psw fstsw [eax] printf ("BEFORE SECOND COMP. sw = %4.4x\n", sw); // compute c = a * b with 53 bits of precision; // result a * b with `unbounded' exponent on FPU stack mov eax, DWORD PTR pa fld DWORD PTR [eax] // push a on the FPU stack mov eax, DWORD PTR pb fld DWORD PTR [eax] // push b on the FPU stack fmulp st(1), st(0) // a * b in st(1), pop st(0) mov eax, DWORD PTR pc fstp DWORD PTR [eax] // c = a * b from FPU stack to memory, pop st(0) // read status word mov eax, DWORD PTR psw fstsw [eax] printf ("AFTER SECOND COMP. sw = %4.4x\n", sw); printf ("AFTER SECOND COMP. c = %8.8x = %f\n", *(unsigned int *)&c, c); // c = * 2^-126 BEFORE FIRST COMP. sw = 0000 AFTER FIRST COMP. sw = 0030 AFTER FIRST COMP. c = = BEFORE SECOND COMP. sw = 0000 AFTER SECOND COMP. sw = 0230 AFTER SECOND COMP. c = = : (FDIVP 0.0 ) FSTP (FDIVP FWAIT ) try/ except _try except () ( ) EXCEPTION_EXECUTE_HANDLER except () ( [3] ) #include <stdio.h> #include <excpt.h> void main () { float f; unsigned short cw, *pcw; // control word and pointer to it pcw = &cw; 01/12/06 28
29 // clear status flags, set control word cw = 0x033b; // round to nearest, 64 bits, zero-divide exceptions enabled fclex mov eax, DWORD PTR pcw fldcw [eax] try { printf ("TRY BLOCK BEFORE DIVIDE BY 0\n"); fldpi // load in ST(0) fldz // load 0.0 in ST(0); in ST(1) fdivp st(1), st(0) // divide ST(1) by ST(0), result in ST(1), pop fstp f // store ST(0) in memory and pop stack top printf ("TRY BLOCK AFTER DIVIDE BY 0 \n"); except(exception_execute_handler) { printf ("EXCEPT BLOCK\n"); ( ) TRY BLOCK BEFORE DIVIDE BY 0 EXCEPT BLOCK FSTP TRY BLOCK BEFORE DIVIDE BY 0 TRY BLOCK AFTER DIVIDE BY 0 8: ( * ) include <stdio.h> #include <excpt.h> void main () { float a, b, c; // single precision floating-point numbers unsigned int u; // unsigned integer (of size 4 bytes) char *pa, *pb, *pc; // pointers to single precision numbers unsigned short t[5], *pt; unsigned short sw, *psw; // status word and pointer to it unsigned short cw, *pcw; // control word and pointer to it psw = &sw; pcw = &cw; // clear exception flags, read status word, // set control word cw = 0x0337; // round to nearest, 64 bits, // overflow exceptions enabled 01/12/06 29
30 fclex mov eax, DWORD PTR pcw fldcw [eax] mov eax, DWORD PTR psw fstsw [eax] printf ("BEFORE COMP. sw = %4.4x\n", sw); pa = (char *)&a; u = 0x ; a = *(float *)&u; // a = 1.0 * 2^115 pb = (char *)&b; u = 0x7e000000; b = *(float *)&u; // b = 1.0 * 2^125 pc = (char *)&c; c = 0.0; pt = t; try { printf ("TRY BLOCK BEFORE OVERFLOW\n"); // compute c = a * b mov eax, DWORD PTR pa fld DWORD PTR [eax] // push a on the FPU stack mov eax, DWORD PTR pb fld DWORD PTR [eax] // push b on the FPU stack fmulp st(1), st(0) // a * b in st(1), pop st(0) // cause the overflow exception mov eax, DWORD PTR pc fstp DWORD PTR [eax] // c = a * b from FPU stack to memory, pop st(0) fwait // trigger floating-point exception if any printf ("TRY BLOCK AFTER OVERFLOW\n"); except(exception_execute_handler) { printf ("EXCEPT BLOCK\n"); // clear exception flags, read status word, // set control word cw = 0x033f; // round to nearest, 64 bits, // exceptions disabled mov eax, DWORD PTR psw fnstsw [eax] fnclex mov eax, DWORD PTR pcw fldcw [eax] printf ("sw = %4.4x\n", sw); // sw=0xb888: B=1, TOP=111, ES=1, OE=1 mov eax, DWORD PTR pt fstp TBYTE PTR [eax] // c = a * b from FPU stack to memory, pop st(0) printf ("t = %4.4x%4.4x%4.4x%4.4x%4.4x\n", t[4],t[3],t[2],t[1],t[0]); // t = 2^240 01/12/06 30
31 FPU (sw=0xb888) ( B=1 TOP=111 ES=1 OE=1 ) (0x40ef ) FPU BEFORE COMP. sw = 0000 TRY BLOCK BEFORE OVERFLOW EXCEPT BLOCK sw = b888 t = 40ef FSTP FSTP FPU 32 9: FPU 2 FPU ( * ) #include <stdio.h> #include <float.h> #include <excpt.h> void main () { unsigned short a[5], b[5], c[5], *pa, *pb, *pc; unsigned short sw, *psw; // status word and pointer to it unsigned short cw, *pcw; // control word and pointer to it psw = &sw; pcw = &cw; // clear exception flags, read status word, // set control word cw = 0x0b37; // round up, 64 bits, overflow exc. enabled fclex mov eax, DWORD PTR pcw fldcw [eax] mov eax, DWORD PTR psw fstsw [eax] printf ("BEFORE COMP. sw = %4.4x\n", sw); // a = 1.0 * 2^16000, b = 1.0 * 2^16000 a[4] = 0x7e7f; a[3] = 0x8000; a[2] = 0x0000; a[1] = 0x0000; a[0] = 0x0001; b[4] = 0x7e7f; b[3] = 0x8000; b[2] = 0x0000; b[1] = 0x0000; b[0] = 0x0001; pa = a; pb = b; pc = c; try { printf ("TRY BLOCK BEFORE OVERFLOW\n"); // compute c = a * b mov eax, DWORD PTR pa fld TBYTE PTR [eax] // push a on the FPU stack mov eax, DWORD PTR pb fld TBYTE PTR [eax] // push b on the FPU stack fmulp st(1), st(0) // a * b in st(1), pop st(0) fwait // trigger floating-point exception if any 01/12/06 31
32 printf ("TRY BLOCK AFTER OVERFLOW\n"); except(exception_execute_handler) { printf ("EXCEPT BLOCK\n"); // clear exceptions, read status word, set control word cw = 0x0b3f; // round up, 64 bits, exceptions disabled mov eax, DWORD PTR psw fnstsw [eax] fnclex mov eax, DWORD PTR pcw fldcw [eax] printf ("sw = %4.4x\n", sw); // sw=0xbaa8: // B=1, TOP=111, C1=1, ES=1, PE=1, OE=1 mov eax, DWORD PTR pc fstp TBYTE PTR [eax] // c = a * b from FPU stack to memory, pop st(0) printf ("c = %4.4x%4.4x%4.4x%4.4x%4.4x\n", c[4],c[3],c[2],c[1],c[0]); // c = 2^32000 / 2^24576 = 2^7424 (biased exponent is 0x5cff) BEFORE COMP. sw = 0000 TRY BLOCK BEFORE OVERFLOW EXCEPT BLOCK sw = baa8 c = 5cff FPU (0x5cff = ) FPU (sw = baa8) B=1 TOP=111 C1=1 ES=1 PE=1 OE=1 (C1=1 ) FMUL /12/06 32
33 3 SIMD SIMD (SSE)( ) ( ) SIMD SSE 1 ( FPU ) 0 NaN 2D 3D 3.1 SIMD SSE ( 3) (FPU FXCH ) ( IA-32 ) XMM7 XMM6 XMM5 XMM4 XMM3 XMM2 XMM1 XMM0 3: SIMD SIMD 4 ( 4 X1 X2 X3 X4 X1 ) X4 X3 X2 X1 4: /12/06 33
34 16 SSE ( ) ( ) 4 ( ) 3 ( ) SIMD / SSE 32 / ( 5) 31 16( ) 6 0 FPU / SSE/ MMX SSE SSE2( 4 ) MMX FPU TOP=0 0( FPU ) FZ RC RC PM UM OM ZM DM IM Res PE UE OE ZE DE IE 5: / MXCSR / 5 0 SSE MXCSR (PC) 15 (MXCSR FZ ) 0 MXCSR SSE FZ (RC) IEEE [1] (RC=00B RC=01B RC=10B RC=11B ) (PM UM OM ZM DM IM) SIMD ( ) ( ) 01/12/06 34
35 5 0(PE UE OE ZE DE IE) 1 ( ) FPU MXCSR SSE 4 (OR) 10: SSE 2 IEEE 1 MULSS FPU 2 (FMUL FST) FPU FMUL 1 2 a b a = b = ( 24 ) a b = ( ) a b = ( ) a b = ( ) a b = ( ) ( 24 ) ( 0 ) a b = (P ) a b = +0.0 (P U ) a b = (P ) a b = +0.0 (P U ) 3.3 SIMD FPU ( ) 6 MXCSR / ( ) 01/12/06 35
36 ( ) ( ) ( ) SIMD FPU 1 MXCSR (OR) ( ) SIMD FPU SIMD 19 FPU SIMD COMISS UCOMISS( ) EFLAGS x87 (x87 ) SSE SIMD MXCSR (OR) ( ) SSE FPU ( ) SIMD FPU (SNaN NaN) QNaN ( QNaN ) ( ) MXCSR FPU FPU 01/12/06 36
37 FPU SSE 2 SIMD 3 SSE / FPU / 1 SIMD / 0 ( MXCSR FZ UM ) (PM 0 ( COMISS UCOMISS) EFLAGS ( ) EFLAGS 3.4 SSE FPU SSE MXCSR SSE MXCSR 4 x87 ( ) SIMD ( IEEE [1] ) 4 (1 2 ) ( ( ) 01/12/06 37
38 3.5 NaN FPU SIMD QNaN ( / ) ( 2 )FPU NaN 3 SSE QNaN 3: SSE QNaN SNaN QNaN 2 SNaN 2 QNaN QNaN 1 NaN( 1 SNaN QNaN ) 1 NaN(QNaN ) 1 NaN SNaN 1 SNaN QNaN( NaN) SNaN QNaN 1 QNaN QNaN NaN QNaN 3.6 SIMD SSE ( ) MMX 32 IA-32 ( [2] ) 4 PS ( packed single precision ) SS ( scalar single precision ) SSE 1. MOVAPS/MOVUPS: move aligned/unaligned packed single precision floating-point; SIMD SIMD 128 : MOVHPS/MOVLPS: move aligned, high/low packed single precision floating-point; SIMD / 64 ( / ) : 01/12/06 38
39 MOVHLPS/MOVLHPS: move high/low to low/high packed single precision floating-point; / 64 / 64 ( / 64 ) : MOVMSKPS: move mask packed, single precision floating-point to r32; 4 32 IA-32 r32 : MOVSS: move scalar single precision floating-point; SIMD 32 SIMD : 2. ADDPS/ADDSS/SUBPS/SUBSS/MULPS/MULSS: add/subtract/multiply packed/scalar, single precision floating-point; 1 SIMD 2 SIMD : I D O U P DIVPS/DIVSS: divide packed/scalar, single precision floating-point; 1 SIMD 2 SIMD : I Z D O U P SQRTPS/SQRTSS: square root packed/scalar, single precision floating-point; SIMD SIMD : I D P 3. MAXPS/MAXSS/MINPS/MINSS: maximum/minimum packed/scalar, single precision floatingpoint; 1 SIMD 2 SIMD : I D( NaN ) 4. CMPPS/CMPSS: compare packed/scalar, single precision floating-point; 1 SIMD 2 SIMD 1( ) 0( ) 32 : I D( lt le nlt nle NaN SNaN ) COMISS/UCOMISS: compare scalar single precision floating-point ordered/unordered and set EFLAGS; 1 SIMD 2 SIMD EFLAGS ZF PF CF : I D( COMISS NaN UCOMISS SNaN ) 01/12/06 39
40 5. CVTPI2PS: MMX 2 32 SIMD ( 2 )2 : P CVTSI2SS: 1 32 SIMD ( )1 : P CVTPS2PI/CVTTPS2PI: SIMD 2 2 MMX 2 32 CVTTPS2PI MXCSR ( ) : I P CVTSS2SI/CVTTSS2SI: SIMD 1 32 CVTTSS2SI MXCSR ( ) : I P 6. ( ) ANDPS/ANDNPS/ORPS/XORPS: packed logical AND, AND-NOT, OR, XOR; : 7. RCPPS/RCPSS: packed/scalar, single precision floating-point reciprocal approximation( ); SIMD SIMD : RSQRTPS/RSQRTSS: packed/scalar, single precision floating-point square root reciprocal approximation( ); SIMD SIMD : 8. FXSAVE/FXRSTOR: 512 FP/MMX SIMD / CS( ) IP( ) FOP( ) FTW(FPU ) FSW(FPU ) FCW(FPU ) MXCSR(SIMD / ) DS( ) DP( ) 8 FPU /MMX 8 SIMD : STMXCSR/LDMXCSR: 32 SIMD / / : 01/12/06 40
41 FXSAVE FXRSTOR FSAVE FRSTOR / SSE SIMD SIMD ( SIMD ) 32 SSE x87 MMX MMX SIMD 3.7 SSE SSE IA-32 ( 8086 ) SSE SSE : CR0.EM( 2) = 0 SSE : CPUID.XMM(EDX 25)=1 FXSAVE/FXRSTOR : CPUID.FXSR(EDX 24)=1 OS SIMD FP : CR4.OSFXSR( 9)=1 SIMD ( [2] ) SIMD SSE OS SIMD : CR4.OSXMMEXCPT( 10)= SSE 11: SSE SIMD (1.0, 1.0, 1.0, 1.0) ( , 0.0, , SNaN) ( ) 1 MXCSR ( ) MXCSR MXCSR SIMD (+inf, +inf, 0.0, QNaN) 1 01/12/06 41
42 SNaN NaN MXCSR MXCSR 1 #include <stdio.h> void main () { char *mem; unsigned int uimem[4]; int mxcsr, *pmxcsr; mem = (char *)uimem; // set and then read new value of MXCSR mxcsr = 0x00009f80; // ftz = 1, rc = 00 (to nearest), traps disabled, flags clear pmxcsr = &mxcsr; mov eax, DWORD PTR pmxcsr ldmxcsr [eax] stmxcsr [eax] printf ("BEFORE SIMD DIVIDE: MXCSR = 0x%8.8x\n", mxcsr); // load first set of operands uimem[0] = 0x3f800000; // 1.0 uimem[1] = 0x3f800000; // 1.0 uimem[2] = 0x3f800000; // 1.0 uimem[3] = 0x3f800000; // 1.0 mov eax, DWORD PTR mem; movups XMM1, [eax]; // load second set of operands uimem[0] = 0x ; // * 2^-126 uimem[1] = 0x ; // 0.0 uimem[2] = 0x7f7fffff; // * 2^127 uimem[3] = 0x7fbf0000; // SNaN mov eax, DWORD PTR mem; movups XMM2, [eax]; // perform SIMD divide and store result to memory divps XMM1, XMM2; mov eax, DWORD PTR mem; movups [eax], XMM1; // read new value of MXCSR mov eax, DWORD PTR pmxcsr stmxcsr [eax] printf ("AFTER SIMD DIVIDE: MXCSR = 0x%8.8x\n", mxcsr); printf ("res = %8.8x %8.8x %8.8x %8.8x = %f %f %f %f\n", 01/12/06 42
43 uimem[0], uimem[1], uimem[2], uimem[3], *(float *)&uimem[0], *(float *)&uimem[1], *(float *)&uimem[2], *(float *)&uimem[3]); The output is: BEFORE SIMD DIVIDE: MXCSR = 0x00009f80 AFTER SIMD DIVIDE: MXCSR = 0x00009fbf Res = 7f f fff0000 = 1.#INF00 1.#INF #QNAN0 MOVUPS SIMD SIMD 16 MOVAPS 16 12: SSE 1.0 / (sqrt (a) 1.0) / (sqrt (a) 1.0) a (a = = ) ( ) R = ( ) 2 24 a = XMM1 #include <stdio.h> void main () { char *mem; unsigned int *uimem; mem = (char *)(((int)malloc (144) + 16) & ~0x0f); // 16-byte aligned uimem = (unsigned int *)mem; // load x[i] in XMM1, i = 0,3 uimem[0] = 0x ; // 2.0 uimem[1] = 0x ; // 3.0 uimem[2] = 0x ; // 4.0 uimem[3] = 0x3f800001; // ulp ( ^-23) mov eax, DWORD PTR mem; movaps XMM1, [eax]; // load y[i] = 1.0 in XMM2, i = 0,3 uimem[0] = 0x3f800000; // 1.0 uimem[1] = 0x3f800000; // 1.0 uimem[2] = 0x3f800000; // 1.0 uimem[3] = 0x3f800000; // 1.0 mov eax, DWORD PTR mem; movaps XMM2, [eax]; // calculate 1.0 / (sqrt (x[i]) - 1.0), i = 0,3 // calculate sqrt (x[i]) in XMM1, i = 0,3 sqrtps XMM1, XMM1; // calculate sqrt (x[i]) in XMM1, i = 0,3 01/12/06 43
44 subps XMM1, XMM2; // calculate 1.0 / (sqrt (x[i]) - 1.0) in XMM2, i = 0,3 divps XMM2, XMM1; // store result in memory mov eax, DWORD PTR mem; movaps [eax], XMM2; printf ("res = %8.8x %8.8x %8.8x %8.8x = %f %f %f %f\n", uimem[0], uimem[1], uimem[2], uimem[3], *(float *)&uimem[0], *(float *)&uimem[1], *(float *)&uimem[2], *(float *)&uimem[3]); res = 401a827a 3faed9ec 3f f = #INF00 a = ( ) SSE2 01/12/06 44
45 4 SIMD SIMD (SSE2) IA MMX / SSE2 SSE2 MMX SSE SSE2 2 ( ) SIMD SSE2 1 ( FPU ) 0 NaN / FPU 4.1 SIMD SSE2 SSE ( 3) SIMD (XMM ) SSE2 / OS SSE / SIMD 2 ( 6 X1 X2 X1 ) X2 X1 6: SIMD / SSE / (MXCSR) SSE2 SSE2 MXCSR (PC) 1 ( ) SSE MXCSR SSE /12/06 45
46 2 (OR) SSE2 ( ) ( ) ( ) ( ) SSE2 FPU SSE ( ) 6 MXCSR / ( ) (MXCSR SSE ) ( ) ( ) ( ) SIMD FPU 1 MXCSR (OR) ( ) SIMD FPU SIMD 19 FPU SIMD COMISS UCOMISS( ) EFLAGS FPU (FPU ) SSE2 SIMD MXCSR (OR) ( ) SSE2 FPU ( ) SSE2 FPU SSE MXCSR SSE 01/12/06 46
47 FPU ( FPU ) FPU SSE2( ) 2 SIMD 3 / SSE2 FPU / SSE 1 SIMD / 0 ( MXCSR FZ UM ) (PM 0 ( COMISS UCOMISS) EFLAGS ( ) EFLAGS 4.4 SSE2 FPU SSE SSE2 MXCSR SSE2 SSE MXCSR 2 x87 ( ) SIMD ( IEEE [1] ) 2 ( ) ( ) 4.5 NaN FPU SSE SSE2 QNaN ( / ) SSE2 2 FPU QNaN 3 SSE NaN 01/12/06 47
48 4.6 SSE2 SSE2 ( ) MMX 32 IA-32 ( [2] ) PD ( packed double precision ) SD ( scalar double precision ) SSE2 1. MOVAPD/MOVUPD: move aligned/unaligned packed double precision floating-point; SIMD SIMD 128 : MOVHPD/MOVLPD: move aligned, high/low packed double precision floating-point; SIMD / 64 ( / ) : MOVMSKPD: move mask packed, double precision floating-point to r32; 2 32 IA-32 r32 : MOVSD: move scalar double precision floating-point; SIMD 64 SIMD : 2. ADDPD/ADDSD/SUBPD/SUBSD/MULPD/MULSD: add/subtract/multiply packed/scalar, double precision floating-point; 1 SIMD 2 SIMD : I, D, O, U, P DIVPD/DIVSD: divide packed/scalar, double precision floating-point; 1 SIMD 2 SIMD : I, Z, D, O, U, P SQRTPD/SQRTSD: square root packed/scalar, double precision floating-point; SIMD SIMD : I, D, P 01/12/06 48
49 3. MAXPD/MAXSD/MINPD/MINSD: maximum/minimum packed/scalar, double precision floating-point; 1 SIMD 2 SIMD : I, D( NaN ) 4. CMPPD/CMPSD: compare packed/scalar, double precision floating-point; 1 SIMD 2 SIMD 1( ) 0( ) 64 : I D( lt le nlt nle NaN SNaN ) COMISD/UCOMISD: compare scalar double precision floating-point ordered/unordered and set EFLAGS; 1 SIMD 2 SIMD EFLAGS ZF PF CF : I D( COMISD NaN UCOMISD SNaN ) 5. CVTPD2PI: MXCSR SIMD MMX 32 CVTSD2SI: MXCSR SIMD 1 32 IA CVTTPD2PI: SIMD MMX 32 CVTTSD2SI: SIMD 1 32 IA CVTPI2PD: MMX 2 32 SIMD 2 CVTSI2SD: 32 IA SIMD CVTPD2DQ/CVTTPD2DQ: SIMD 2 SIMD 2 32 CVTPD2DQ 01/12/06 49
50 MXCSR CVTTPD2DQ CVTDQ2PD: SIMD 2 32 SIMD 2 CVTPS2PD: SIMD 2 SIMD 2 CVTSS2SD: SIMD SIMD CVTPD2PS: SIMD 2 SIMD 2 CVTSD2SS: SIMD SIMD CVTPS2DQ/CVTTPS2DQ: SIMD 4 SIMD 4 32 CVTPS2DQ MXCSR CVTTPS2DQ CVTDQ2PS: SIMD 4 32 SIMD 4 6. ( ) ANDPD/ANDNPD/ORPD/XORPD: packed logical AND, AND-NOT, OR, XOR; : 7. SSE2 : SSE (FXSAVE, FXRSTOR, STMXCSR, LDMXCSR) SSE2 SIMD SIMD ( SIMD ) 64 SSE 4.7 SSE2 SSE2 IA-32 ( 8086 ) SSE2 SSE2 : CR0.EM( 2) = 0 SSE2 : CPUID.WNI=1 FXSAVE/FXRSTOR : CPUID.FXSR(EDX 24)=1 01/12/06 50
51 OS SIMD FP : CR4.OSFXSR( 9)=1 SIMD ( [2] ) SIMD SSE2 OS SIMD : CR4.OSXMMEXCPT( 10)= : SSE2 1.0 / (sqrt (a) 1.0) 12 SSE 1.0 / (sqrt (a) 1.0) a (a = = ) R = ( ) 2 24 a = XMM1 #include <stdio.h> void main () { char *mem; unsigned int *uimem; mem = (char *)(((int)malloc (144) + 16) & ~0x0f); // 16-byte aligned // printf ("mem = %x\n\n", (int)mem); uimem = (unsigned int *)mem; // load x[i] in XMM1, i = 0,1 uimem[1] = 0x ; uimem[0] = 0x ; // 2.0 (in uimem[1], uimem[0]) uimem[3] = 0x3ff00000; uimem[2] = 0x ; // ^-23 (in uimem[3], uimem[2]) mov eax, DWORD PTR mem; movaps XMM1, [eax]; // load y[i] = 1.0 in XMM2, i = 0,1 uimem[1] = 0x3ff00000; uimem[0] = 0x ; // 1.0 uimem[3] = 0x3ff00000; uimem[2] = 0x ; // 1.0 mov eax, DWORD PTR mem; movaps XMM2, [eax]; // calculate 1.0 / (sqrt (x[i]) - 1.0), i = 0,1 // calculate sqrt (x[i]) in XMM1, i = 0,1 sqrtpd XMM1, XMM1; // calculate sqrt (x[i]) in XMM1, i = 0,1 subpd XMM1, XMM2; // calculate 1.0 / (sqrt (x[i]) - 1.0) in XMM2, i = 0,1 divpd XMM2, XMM1; 01/12/06 51
52 // store result in memory mov eax, DWORD PTR mem; movaps [eax], XMM2; printf ("res = %8.8x%8.8x %8.8x%8.8x = %f %f\n", uimem[1], uimem[0], uimem[3], uimem[2], *(double *)&uimem[0], *(double *)&uimem[2]); res = f333f9de = (uimem[3] uimem[2] )a = R R* = ( ) 2 24 ε = (R R*) / R = ( ) / ( ) ( 12 ) ( ) 1.6 (ε ) 01/12/06 52
53 5 4 IA-32 FPU SSE SSE2 4: IA-32 FPU SSE SSE2 FPU SSE SSE2 FPU SSE OS SSE2 OS FPU OS SSE OS SSE2 OS OS 4 SIMD 2 SIMD : : : IA-32 IA-32 ( ) (SSE2 (SSE ) ) / / / FPU / / MXCSR(SSE2 ) MXCSR(SSE ) 01/12/06 53
54 4: IA-32 FPU SSE SSE2 ( ) FPU SSE SSE2 4 2 (OR) (OR) / / (I D Z) (I D Z) (I D Z) (O U P) (O U P) (O U P) ( ) 01/12/06 54
55 4: IA-32 FPU SSE SSE2 ( ) FPU SSE SSE2 FPU IEEE % IEEE IEEE % % ( (IEEE 754 ) ) (IEEE 754 ) FPU ( SSE SSE2 SSE2 SSE )SSE SSE2 NaN ( NaN ( NaN )FPU )FPU FPU SSE SSE2 14: FPU SSE SSE2 ( ) (((1 / ((1 / 10) / (1 / 3)) + 3 / 10) / 11) * (1 / (1 / 99) + 11)) * 39 = 1417 SSE ( 4 ) #include <stdio.h> void main () { float res[4], *pres = res, a1[4] = {1.0, 1.0, 1.0, 1.0, *pa1 = a1, a3[4] = {3.0, 3.0, 3.0, 3.0, *pa3 = a3, a10[4] = {10.0, 10.0, 10.0, 10.0, *pa10 = a10, a11[4] = {11.0, 11.0, 11.0, 11.0, *pa11 = a11, a39[4] = {39.0, 39.0, 39.0, 39.0, *pa39 = a39, a99[4] = {99.0, 99.0, 99.0, 99.0, *pa99 = a99; mov eax, DWORD PTR pa1 movups XMM5, [eax] // 1 in xmm5 01/12/06 55
56 movaps XMM1, XMM5 // 1 in xmm1 mov eax, DWORD PTR pa10 movups XMM2, [eax] // 10 in xmm2 divps XMM1, XMM2 // 1/10 in xmm1 movaps XMM2, XMM5 // 1 in xmm2 mov eax, DWORD PTR pa3 movups XMM3, [eax] // 3 in xmm3 divps XMM2, XMM3 // 1/3 in xmm2 divps XMM1, XMM2 // 3/10 in xmm1 movaps XMM2, XMM5 // 1 in xmm2 divps XMM2, XMM1 // 10/3 in xmm2 mov eax, DWORD PTR pa10 movups XMM1, [eax] // 10 in xmm1 divps XMM3, XMM1 // 3/10 in xmm3 addps XMM2, XMM3 // 109/30 in xmm2 mov eax, DWORD PTR pa11 movups XMM1, [eax] // 11 in xmm1 divps XMM2, XMM1 // 109/330 in xmm2 mov eax, DWORD PTR pa99 movups XMM3, [eax] // 99 in xmm3 movups XMM4, XMM5 // 1 in xmm4 divps XMM4, XMM3 // 1/99 in xmm4 divps XMM5, XMM4 // 99 in xmm5 addps XMM1, XMM5 // 110 in xmm1 mulps XMM1, XMM2 // 109/3 in xmm1 mov eax, DWORD PTR pa39 movups XMM2, [eax] // 39 in xmm2 mulps XMM1, XMM2 // 1417 in xmm1 mov eax, DWORD PTR pres; movups [eax], XMM1; printf ("res = \n\t%8.8x %8.8x %8.8x %8.8x = \n\t%f %f %f %f\n", *(unsigned int *)&res[0], *(unsigned int *)&res[1], *(unsigned int *)&res[2], *(unsigned int *)&res[3], res[0], res[1], res[2], res[3]); IEEE res = 44b b b b12001 = ulp res = ulp = = e = ε 1 = FPU 24 FPU IEEE SSE2 ( ) ( 2 ) 01/12/06 56
57 #include <stdio.h> void main () { double res[2], *pres = res, a1[2] = {1.0, 1.0, *pa1 = a1, a3[2] = {3.0, 3.0, *pa3 = a3, a10[2] = {10.0, 10.0, *pa10 = a10, a11[2] = {11.0, 11.0, *pa11 = a11, a39[2] = {39.0, 39.0, *pa39 = a39, a99[2] = {99.0, 99.0, *pa99 = a99; unsigned int *uint; uint = (unsigned int *)res; mov eax, DWORD PTR pa1 movupd XMM5, [eax] // 1 in xmm5 movapd XMM1, XMM5 // 1 in xmm1 mov eax, DWORD PTR pa10 movupd XMM2, [eax] // 10 in xmm2 divpd XMM1, XMM2 // 1/10 in xmm1 movapd XMM2, XMM5 // 1 in xmm2 mov eax, DWORD PTR pa3 movupd XMM3, [eax] // 3 in xmm3 divpd XMM2, XMM3 // 1/3 in xmm2 divpd XMM1, XMM2 // 3/10 in xmm1 movapd XMM2, XMM5 // 1 in xmm2 divpd XMM2, XMM1 // 10/3 in xmm2 mov eax, DWORD PTR pa10 movupd XMM1, [eax] // 10 in xmm1 divpd XMM3, XMM1 // 3/10 in xmm3 addpd XMM2, XMM3 // 109/30 in xmm2 mov eax, DWORD PTR pa11 movupd XMM1, [eax] // 11 in xmm1 divpd XMM2, XMM1 // 109/330 in xmm2 mov eax, DWORD PTR pa99 movupd XMM3, [eax] // 99 in xmm3 movupd XMM4, XMM5 // 1 in xmm4 divpd XMM4, XMM3 // 1/99 in xmm4 divpd XMM5, XMM4 // 99 in xmm5 addpd XMM1, XMM5 // 110 in xmm1 mulpd XMM1, XMM2 // 109/3 in xmm1 mov eax, DWORD PTR pa39 movupd XMM2, [eax] // 39 in xmm2 mulpd XMM1, XMM2 // 1417 in xmm1 mov eax, DWORD PTR pres; movupd [eax], XMM1; printf ("res = \n\t%8.8x%8.8x %8.8x%8.8x = \n\t%f %f\n", uint[3], uint[2], uint[1], uint[0], res[1], res[0]); IEEE res = fffffffffe fffffffffe = /12/06 57
58 1417 2ulp res = ulp = = e = ε 2 = (ε 1 = ) FPU 53 FPU IEEE FPU (FPU PC=11 ) #include <stdio.h> void main () { float a3 = 3., a10 = 10., a11 = 11., a39 = 39., a99 = 99.; char *pa3, *pa10, *pa11, *pa39, *pa99; // pointers to single precision numbers unsigned short t[5], *pt; // 10-byte (80-bit) result unsigned short cw, *pcw; // control word and pointer to it float res; // result, used just to print the decimal value char *pres; pa3 = (char *)&a3; pa10 = (char *)&a10; pa11 = (char *)&a11; pa39 = (char *)&a39; pa99 = (char *)&a99; pt = t; pres = (char *)&res; pcw = &cw; // set control word cw = 0x033f; // round to nearest, 64 bits, exceptions disabled // (double-extended precision) // cw = 0x023f; // (use for pure IEEE double precision) // round to nearest, 53 bits, exceptions disabled // cw = 0x003f; // (use for pure IEEE single precision) // round to nearest, 24 bits, exceptions disabled mov eax, DWORD PTR pcw fldcw [eax] // compute E = fld1 // 1 in st(0) mov eax, DWORD PTR pa10 fdiv DWORD PTR [eax] // 1/10 in st(0) fld1 // 1 in st(0), 1/10 in st(1) mov eax, DWORD PTR pa3 fdiv DWORD PTR [eax] // 1/3 in st(0), 1/10 in st(1) fdivp st(1), st(0) // 3/10 in st(0) fld1 // 1 in st(0), 3/10 in st(1) fxch // 3/10 in st(0), 1 in st(1) 01/12/06 58
59 fdivp st(1), st(0) // 10/3 in st(0) mov eax, DWORD PTR pa3 fld DWORD PTR [eax] // 3 in st(0), 10/3 in st(1) mov eax, DWORD PTR pa10 fdiv DWORD PTR [eax] // 3/10 in st(0), 10/3 in st(1) faddp st(1), st(0) // 109/30 in st(0) mov eax, DWORD PTR pa11 fdiv DWORD PTR [eax] // 109/330 in st(0) fld1 // 1 in st(0), 109/330 in st(1) mov eax, DWORD PTR pa99 fdiv DWORD PTR [eax] // 1/99 in st(0), 109/330 in st(1) fld1 // 1 in st(0), 1/99 in st(1), 109/330 in st(2) fxch // 1/99 in st(0), 1 in st(1), 109/330 in st(2) fdivp st(1), st(0) // 99 in st(0), 109/330 in st(1) mov eax, DWORD PTR pa11 fadd DWORD PTR [eax] // 110 in st(0), 109/330 in st(1) fmulp st(1), st(0) // 109/3 in st(0) mov eax, DWORD PTR pa39 fmul DWORD PTR [eax] // 1417 in st(0) mov eax, DWORD PTR pres fst DWORD PTR [eax] // res from the FPU stack to memory, pop st(0) mov eax, DWORD PTR pt fstp TBYTE PTR [eax] // res from the FPU stack to memory, pop st(0) printf ("res = %4.4x%4.4x%4.4x%4.4x%4.4x\n", t[4], t[3], t[2], t[1], t[0]); // t = printf ("res = %6.6f\n", res); IEEE res = 4009b res = ulp res = ulp = = e = ε 3 = ε 1 = > ε 2 = > ε 3 = /12/06 59
60 6 FPU BCD ( ) SIMD SSE SSE2 IA-32 FPU SSE SSE2 IEEE IEEE ( SSE SSE2 ) IA-32 IEEE 1 01/12/06 60
AxC_lj.fm
IA-32 IA-32 Intel Pentium 4 Intel NetBurst 1 2 /SIMD IA-32 Pentium 4 ( OP) IA-32 IA-32 ( OP) 1 I/O 2 xchg ( OP) 5 ( OP) IA-32 ROM ( OP) ROM ROM ( OP) ( OP) 4 1 32 ROM 16 PADDQ PMULUDQ 2 1 1 1 2 2 2 1 http://www.intel.co.jp/jp/developer/vtune/
64bit SSE2 SSE2 FPU Visual C++ 64bit Inline Assembler 4 FPU SSE2 4.1 FPU Control Word FPU 16bit R R R IC RC(2) PC(2) R R PM UM OM ZM DM IM R: reserved
(Version: 2013/5/16) Intel CPU ([email protected]) 1 Intel CPU( AMD CPU) 64bit SIMD Inline Assemler Windows Visual C++ Linux gcc 2 FPU SSE2 Intel CPU double 8087 FPU (floating point number processing unit)
64bit SSE2 SSE2 FPU Visual C++ 64bit Inline Assembler 4 FPU SSE2 4.1 FPU Control Word FPU 16bit R R R IC RC(2) PC(2) R R PM UM OM ZM DM IM R: reserved
(Version: 2013/7/10) Intel CPU ([email protected]) 1 Intel CPU( AMD CPU) 64bit SIMD Inline Assemler Windows Visual C++ Linux gcc 2 FPU SSE2 Intel CPU double 8087 FPU (floating point number processing unit)
ストリーミング SIMD 拡張命令2 (SSE2) を使用した、倍精度浮動小数点ベクトルの最大/最小要素とそのインデックスの検出
SIMD 2(SSE2) / 2.0 2000 7 : 248602J-001 01/10/30 1 305-8603 115 Fax: 0120-47-8832 * Copyright Intel Corporation 1999-2001 01/10/30 2 1...5 2...5 2.1...5 2.1.1...5 2.1.2...8 3...9 3.1...9 3.2...9 4...9
main.dvi
20 II 7. 1 409, 3255 e-mail: [email protected] 2 1 1 1 4 2 203 2 1 1 1 5 503 1 3 1 2 2 Web http://www.icsd2.tj.chiba-u.jp/~namba/lecture/ 1 2 1 5 501 1,, \,", 2000 7. : 1 1 CPU CPU 1 Intel Pentium
インテル エクステンデッド メモリ 64 テクノロジ ソフトウェア デベロッパーズ ガイド 第 2 巻 ( 全 2 巻 ) リビジョン 1.1 注記 : 本書は 第 1 巻と第 2 巻で構成されています ソフトウェアを設計する際は 第 1 巻と第 2 巻の両方を参照してください
インテル エクステンデッド メモリ 64 テクノロジ ソフトウェア デベロッパーズ ガイド 第 2 巻 ( 全 2 巻 ) リビジョン 1.1 注記 : 本書は 第 1 巻と第 2 巻で構成されています ソフトウェアを設計する際は 第 1 巻と第 2 巻の両方を参照してください 300835-002JA 本資料に掲載されている情報は インテル製品の概要を目的としたものです 本資料は 明示されているか否かにかかわらず
1 (bit ) ( ) PC WS CPU IEEE754 standard ( 24bit) ( 53bit)
GNU MP BNCpack [email protected] 2002 9 20 ( ) Linux Conference 2002 1 1 (bit ) ( ) PC WS CPU IEEE754 standard ( 24bit) ( 53bit) 10 2 2 3 4 5768:9:; = %? @BADCEGFH-I:JLKNMNOQP R )TSVU!" # %$ & " #
Microsoft PowerPoint - NxLecture ppt [互換モード]
011-05-19 011 年前学期 TOKYO TECH 命令処理のための基本的な 5 つのステップ 計算機アーキテクチャ第一 (E) 5. プロセッサの動作原理と議論 吉瀬謙二計算工学専攻 kise_at_cs.titech.ac.jp W61 講義室木曜日 13:0-1:50 IF(Instruction Fetch) メモリから命令をフェッチする. ID(Instruction Decode)
joho07-1.ppt
0xbffffc5c 0xbffffc60 xxxxxxxx xxxxxxxx 00001010 00000000 00000000 00000000 01100011 00000000 00000000 00000000 xxxxxxxx x y 2 func1 func2 double func1(double y) { y = y + 5.0; return y; } double func2(double*
Microsoft Word - C.....u.K...doc
C uwêííôöðöõ Ð C ÔÖÐÖÕ ÐÊÉÌÊ C ÔÖÐÖÕÊ C ÔÖÐÖÕÊ Ç Ê Æ ~ if eíè ~ for ÒÑÒ ÌÆÊÉÉÊ ~ switch ÉeÍÈ ~ while ÒÑÒ ÊÍÍÔÖÐÖÕÊ ~ 1 C ÔÖÐÖÕ ÐÊÉÌÊ uê~ ÏÒÏÑ Ð ÓÏÖ CUI Ô ÑÊ ÏÒÏÑ ÔÖÐÖÕÎ d ÈÍÉÇÊ ÆÒ Ö ÒÐÑÒ ÊÔÎÏÖÎ d ÉÇÍÊ
.,. 0. (MSB). =2, =1/2.,. MSB LSB, LSB MSB. MSB 0 LSB 0 0 P
, 0 (MSB) =2, =1/2, MSB LSB, LSB MSB MSB 0 LSB 0 0 P61 231 1 (100, 100 3 ) 2 10 0 1 1 0 0 1 0 0 100 (64+32+4) 2 10 100 2 5, ( ), & 3 (hardware), (software) (firmware), hardware, software 4 wired logic
ARM gcc Kunihiko IMAI 2009 1 11 ARM gcc 1 2 2 2 3 3 4 3 4.1................................. 3 4.2............................................ 4 4.3........................................
untitled
PC [email protected] muscle server blade server PC PC + EHPC/Eric (Embedded HPC with Eric) 1216 Compact PCI Compact PCIPC Compact PCISH-4 Compact PCISH-4 Eric Eric EHPC/Eric EHPC/Eric Gigabit
Microsoft PowerPoint - Lecture ppt [互換モード]
2012-05-31 2011 年前学期 TOKYO TECH 固定小数点表現 計算機アーキテクチャ第一 (E) あまり利用されない 小数点の位置を固定する データ形式 (2) 吉瀬謙二計算工学専攻 kise_at_cs.titech.ac.jp W641 講義室木曜日 13:20-14:50-2.625 符号ビット 小数点 1 0 1 0 1 0 1 0 4 2 1 0.5 0.25 0.125
ex01.dvi
,. 0. 0.0. C () /******************************* * $Id: ex_0_0.c,v.2 2006-04-0 3:37:00+09 naito Exp $ * * 0. 0.0 *******************************/ #include int main(int argc, char **argv) { double
SQUFOF NTT Shanks SQUFOF SQUFOF Pentium III Pentium 4 SQUFOF 2.03 (Pentium 4 2.0GHz Willamette) N UBASIC 50 / 200 [
SQUFOF SQUFOF NTT 2003 2 17 16 60 Shanks SQUFOF SQUFOF Pentium III Pentium 4 SQUFOF 2.03 (Pentium 4 2.0GHz Willamette) 60 1 1.1 N 62 16 24 UBASIC 50 / 200 [ 01] 4 large prime 943 2 1 (%) 57 146 146 15
ex01.dvi
,. 0. 0.0. C () /******************************* * $Id: ex_0_0.c,v.2 2006-04-0 3:37:00+09 naito Exp $ * * 0. 0.0 *******************************/ #include int main(int argc, char **argv) double
ストリーミング SIMD 拡張命令2 (SSE2) を使用した SAXPY/DAXPY
SIMD 2(SSE2) SAXPY/DAXPY 2.0 2000 7 : 248600J-001 01/12/06 1 305-8603 115 Fax: 0120-47-8832 * Copyright Intel Corporation 1999, 2000 01/12/06 2 1...5 2 SAXPY DAXPY...5 2.1 SAXPY DAXPY...6 2.1.1 SIMD C++...6
For_Beginners_CAPL.indd
CAPL Vector Japan Co., Ltd. 目次 1 CAPL 03 2 CAPL 03 3 CAPL 03 4 CAPL 04 4.1 CAPL 4.2 CAPL 4.3 07 5 CAPL 08 5.1 CANoe 5.2 CANalyzer 6 CAPL 10 7 CAPL 11 7.1 CAPL 7.2 CAPL 7.3 CAPL 7.4 CAPL 16 7.5 18 8 CAPL
プロセッサ・アーキテクチャ
2. NII51002-8.0.0 Nios II Nios II Nios II 2-3 2-4 2-4 2-6 2-7 2-9 I/O 2-18 JTAG Nios II ISA ISA Nios II Nios II Nios II 2 1 Nios II Altera Corporation 2 1 2 1. Nios II Nios II Processor Core JTAG interface
C 2 / 21 1 y = x 1.1 lagrange.c 1 / Laglange / 2 #include <stdio.h> 3 #include <math.h> 4 int main() 5 { 6 float x[10], y[10]; 7 float xx, pn, p; 8 in
C 1 / 21 C 2005 A * 1 2 1.1......................................... 2 1.2 *.......................................... 3 2 4 2.1.............................................. 4 2.2..............................................
3 SIMPLE ver 3.2: SIMPLE (SIxteen-bit MicroProcessor for Laboratory Experiment) 1 16 SIMPLE SIMPLE 2 SIMPLE 2.1 SIMPLE (main memo
3 SIMPLE ver 3.2: 20190404 1 3 SIMPLE (SIxteen-bit MicroProcessor for Laboratory Experiment) 1 16 SIMPLE SIMPLE 2 SIMPLE 2.1 SIMPLE 1 16 16 (main memory) 16 64KW a (C )*(a) (register) 8 r[0], r[1],...,
tutorial_lc.dvi
00 Linux v.s. RT Linux v.s. ART-Linux Linux RT-Linux ART-Linux Linux [email protected] 1 1.1 Linux Yes, No.,. OS., Yes. Linux,.,, Linux., Linux.,, Linux. Linux.,,. Linux,.,, 0..,. RT-Linux
/* sansu1.c */ #include <stdio.h> main() { int a, b, c; /* a, b, c */ a = 200; b = 1300; /* a 200 */ /* b 200 */ c = a + b; /* a b c */ }
C 2: A Pedestrian Approach to the C Programming Language 2 2-1 2.1........................... 2-1 2.1.1.............................. 2-1 2.1.2......... 2-4 2.1.3..................................... 2-6
ohp03.dvi
19 3 ( ) 2019.4.20 CS 1 (comand line arguments) Unix./a.out aa bbb ccc ( ) C main void int main(int argc, char *argv[]) {... 2 (2) argc argv argc ( ) argv (C char ) ( 1) argc 4 argv NULL. / a. o u t \0
Security Solution 2008.pptx
Security Solution 2008 Windows DOS (apack, lzexe, diet, pklite) Linux (gzexe, UPX) PE PE DOS Stub Space Section Header.idata PE Header & Optional Header Space.unpack (unpack code) Section Header.unpack
r1.dvi
2006 1 2006.10.6 ( 2 ( ) 1 2 1.5 3 ( ) Ruby Java Java Java ( Web Web http://lecture.ecc.u-tokyo.ac.jp/~kuno/is06/ / ( / @@@ ( 3 ) @@@ : ( ) @@@ (Q&A) ( ) 1 http://www.sodan.ecc.u-tokyo.ac.jp/cgi-bin/qbbs/view.cgi
pptx
iphone 2010 8 18 C [email protected] C Hello, World! Hello World hello.c! printf( Hello, World!\n );! os> ls! hello.c! os> cc hello.c o hello! os> ls! hello!!hello.c! os>./hello! Hello, World!! os>! os>
NL-22/NL-32取扱説明書_操作編
MIC / Preamp ATT NL-32 A C ATT AMP 1 AMP 2 AMP 3 FLAT FLAT CAL.SIG. OVER LOAD DET. AMP 4 AMP 5 A/D D/A CONV. AMP 6 AMP 7 A/D CONV. Vref. AMP 8 AMP 10 DC OUT AMP 9 FILTER OUT AC DC OUT AC OUT KEY SW Start
r07.dvi
19 7 ( ) 2019.4.20 1 1.1 (data structure ( (dynamic data structure 1 malloc C free C (garbage collection GC C GC(conservative GC 2 1.2 data next p 3 5 7 9 p 3 5 7 9 p 3 5 7 9 1 1: (single linked list 1
ohp07.dvi
19 7 ( ) 2019.4.20 1 (data structure) ( ) (dynamic data structure) 1 malloc C free 1 (static data structure) 2 (2) C (garbage collection GC) C GC(conservative GC) 2 2 conservative GC 3 data next p 3 5
r03.dvi
19 ( ) 019.4.0 CS 1 (comand line arguments) Unix./a.out aa bbb ccc ( ) C main void... argc argv argc ( ) argv (C char ) ( 1) argc 4 argv NULL. / a. o u t \0 a a \0 b b b \0 c c c \0 1: // argdemo1.c ---
LM2940
1A 3 1A 3 0.5V 1V 1A 3V 1A 5V 30mA (V IN V OUT 3V) 2 (60V) * C Converted to nat2000 DTD updated with tape and reel with the new package name. SN Mil-Aero: Order Info table - moved J-15 part from WG row
FFTSS Library Version 3.0 User's Guide
: 19 10 31 FFTSS 3.0 Copyright (C) 2002-2007 The Scalable Software Infrastructure Project, (CREST),,. http://www.ssisc.org/ Contents 1 4 2 (DFT) 4 3 4 3.1 UNIX............................................
Oracle Rdb: PowerPoint Presentation
Day2-3 Itanium: T S Oracle Rdb 2006 4 4 2006 4 6 2005-2006, Oracle Corporation VAX/Alpha IEEE Rdb IEEE SQL SQL SQL 2 : 12340000 = 1.234 x 10 7 ( ) -1.234 x 10 7-1.234 x 10 7-1.234 x 10 7 (10-2 = 1/100)
(2 Linux Mozilla [ ] [ ] [ ] [ ] URL 2 qkc, nkc ~/.cshrc (emacs 2 set path=($path /usr/meiji/pub/linux/bin tcsh b
II 5 (1 2005 5 26 http://www.math.meiji.ac.jp/~mk/syori2-2005/ UNIX (Linux Linux 1 : 2005 http://www.math.meiji.ac.jp/~mk/syori2-2005/jouhousyori2-2005-00/node2. html ( (Linux 1 2 ( ( http://www.meiji.ac.jp/mind/tool/internet-license/
1 4 1.1........................................... 4 1.2.................................. 4 1.3................................... 4 2 5 2.1 GPU.....
CPU GPU N Q07-065 2011 2 17 1 1 4 1.1........................................... 4 1.2.................................. 4 1.3................................... 4 2 5 2.1 GPU...........................................
VM-53PA1取扱説明書
VM-53PA1 VM-53PA1 VM-53 VM-53A VM-52 VM-52A VM-53PA1 VM-53PA1 VM-53A CF i ii VM-53 VM-53A VM-52 VM-52A CD-ROM iii VM-53PA1 Microsoft Windows 98SE operating system Microsoft Windows 2000 operating system
1.ppt
/* * Program name: hello.c */ #include int main() { printf( hello, world\n ); return 0; /* * Program name: Hello.java */ import java.io.*; class Hello { public static void main(string[] arg)
Excel97関数編
Excel97 SUM Microsoft Excel 97... 1... 1... 1... 2... 3... 3... 4... 5... 6... 6... 7 SUM... 8... 11 Microsoft Excel 97 AVERAGE MIN MAX SUM IF 2 RANK TODAY ROUND COUNT INT VLOOKUP 1/15 Excel A B C A B
インテル(R) Visual Fortran Composer XE
Visual Fortran Composer XE 1. 2. 3. 4. 5. Visual Studio 6. Visual Studio 7. 8. Compaq Visual Fortran 9. Visual Studio 10. 2 https://registrationcenter.intel.com/regcenter/ w_fcompxe_all_jp_2013_sp1.1.139.exe
PC Windows 95, Windows 98, Windows NT, Windows 2000, MS-DOS, UNIX CPU
1. 1.1. 1.2. 1 PC Windows 95, Windows 98, Windows NT, Windows 2000, MS-DOS, UNIX CPU 2. 2.1. 2 1 2 C a b N: PC BC c 3C ac b 3 4 a F7 b Y c 6 5 a ctrl+f5) 4 2.2. main 2.3. main 2.4. 3 4 5 6 7 printf printf
web07.dvi
93 7 MATLAB Octave MATLAB Octave MAT MATLAB Octave copyright c 2004 Tatsuya Kitamura / All rights reserved. 94 7 7.1 UNIX Windows pwd Print Working Directory >> pwd ans = /home/kitamura/matlab pwd cd Change
インテル(R) Visual Fortran Composer XE 2013 Windows版 入門ガイド
Visual Fortran Composer XE 2013 Windows* エクセルソフト株式会社 www.xlsoft.com Rev. 1.1 (2012/12/10) Copyright 1998-2013 XLsoft Corporation. All Rights Reserved. 1 / 53 ... 3... 4... 4... 5 Visual Studio... 9...
thesis.dvi
H8 e041220 2009 2 Copyright c 2009 by Kentarou Nagashima c 2009 Kentarou Nagashima All rights reserved , H8.,,,..,.,., AKI-H8/3052LAN. OS. OS H8 Write Turbo. H8 C, Cygwin.,., windows. UDP., (TA7279P).,.
LM2940/LM2940C 1A 低ドロップアウト3 端子レギュレータ
LM2940,LM2940C LM2940/LM2940C 1A Low Dropout Regulator Literature Number: JAJSBB5 LM2940/LM2940C 1A 3 LM2940/LM2940C 0.5V 1V 1A 3V 1A 5V 30mA (V IN V OUT 3V) LM2940 * 1A Low Dropout Regulator LM2940C 1A
num2.dvi
[email protected] http://kanenko.a.la9.jp/ 16 32...... h 0 h = ε () 0 ( ) 0 1 IEEE754 (ieee754.c Kerosoft Ltd.!) 1 2 : OS! : WindowsXP ( ) : X Window xcalc.. (,.) C double 10,??? 3 :, ( ) : BASIC,
2.2 Sage I 11 factor Sage Sage exit quit 1 sage : exit 2 Exiting Sage ( CPU time 0m0.06s, Wall time 2m8.71 s). 2.2 Sage Python Sage 1. Sage.sage 2. sa
I 2017 11 1 SageMath SageMath( Sage ) Sage Python Sage Python Sage Maxima Maxima Sage Sage Sage Linux, Mac, Windows *1 2 Sage Sage 4 1. ( sage CUI) 2. Sage ( sage.sage ) 3. Sage ( notebook() ) 4. Sage
超初心者用
3 1999 10 13 1. 2. hello.c printf( Hello, world! n ); cc hello.c a.out./a.out Hello, world printf( Hello, world! n ); 2 Hello, world printf n printf 3. ( ) int num; num = 100; num 100 100 num int num num
( )
18 10 01 ( ) 1 2018 4 1.1 2018............................... 4 1.2 2018......................... 5 2 2017 7 2.1 2017............................... 7 2.2 2017......................... 8 3 2016 9 3.1 2016...............................
DA100データアクイジションユニット通信インタフェースユーザーズマニュアル
Instruction Manual Disk No. RE01 6th Edition: November 1999 (YK) All Rights Reserved, Copyright 1996 Yokogawa Electric Corporation 801234567 9 ABCDEF 1 2 3 4 1 2 3 4 1 2 3 4 1 2
¥Ñ¥Ã¥±¡¼¥¸ Rhpc ¤Î¾õ¶·
Rhpc COM-ONE 2015 R 27 12 5 1 / 29 1 2 Rhpc 3 forign MPI 4 Windows 5 2 / 29 1 2 Rhpc 3 forign MPI 4 Windows 5 3 / 29 Rhpc, R HPC Rhpc, ( ), snow..., Rhpc worker call Rhpc lapply 4 / 29 1 2 Rhpc 3 forign
106 4 4.1 1 25.1 25.4 20.4 17.9 21.2 23.1 26.2 1 24 12 14 18 36 42 24 10 5 15 120 30 15 20 10 25 35 20 18 30 12 4.1 7 min. z = 602.5x 1 + 305.0x 2 + 2
105 4 0 1? 1 LP 0 1 4.1 4.1.1 (intger programming problem) 1 0.5 x 1 = 447.7 448 / / 2 1.1.2 1. 2. 1000 3. 40 4. 20 106 4 4.1 1 25.1 25.4 20.4 17.9 21.2 23.1 26.2 1 24 12 14 18 36 42 24 10 5 15 120 30
DOPRI5.dvi
ODE DOPRI5 ( ) 16 3 31 Runge Kutta Dormand Prince 5(4) [1, pp. 178 179] DOPRI5 http://www.unige.ch/math/folks/hairer/software.html Fortran C C++ [3, pp.51 56] DOPRI5 C cprog.tar % tar xvf cprog.tar cprog/
44 6 MPI 4 : #LIB=-lmpich -lm 5 : LIB=-lmpi -lm 7 : mpi1: mpi1.c 8 : $(CC) -o mpi1 mpi1.c $(LIB) 9 : 10 : clean: 11 : -$(DEL) mpi1 make mpi1 1 % mpiru
43 6 MPI MPI(Message Passing Interface) MPI 1CPU/1 PC Cluster MPICH[5] 6.1 MPI MPI MPI 1 : #include 2 : #include 3 : #include 4 : 5 : #include "mpi.h" 7 : int main(int argc,
Technische Beschreibung P82R SMD
P26 halstrup-walcher GmbH http://www.krone.co.jp/ Stegener Straße 10 D-79199 Kirchzarten, Germany 124-0023 2-22-1 TEL:03-3695-5431 FAX:03-3695-5698 E-MAIL:[email protected] 530-0054 2-2-9F TEL:06-6361-4831
untitled
Fortran90 ( ) 17 12 29 1 Fortran90 Fortran90 FORTRAN77 Fortran90 1 Fortran90 module 1.1 Windows Windows UNIX Cygwin (http://www.cygwin.com) C\: Install Cygwin f77 emacs latex ps2eps dvips Fortran90 Intel
program.dvi
2001.06.19 1 programming semi ver.1.0 2001.06.19 1 GA SA 2 A 2.1 valuename = value value name = valuename # ; Fig. 1 #-----GA parameter popsize = 200 mutation rate = 0.01 crossover rate = 1.0 generation
<90CE90EC88E290D55F955C8E862E656336>
5 5 9 9 7 7 5 5 6 6 7 7 8 8 9 9 8 8 8 8 79 79 78 78 76 76 77 77 7 7 6 7 7 5 68 68 67 67 66 66 65 65 6 6 6 6 6 6 6 6 6 6 59 59 58 58 57 57 56 56 55 55 5 5 8 8 5 5 9 9 9 8 7 9 9 8 8 7 7 6 6 5 5 5 5 69 69
IA-32 インテル® アーキテクチャ・ソフトウェア・デベロッパーズ・マニュアル
IA-32 インテル アーキテクチャソフトウェア デベロッパーズ マニュアル 中巻 B: 命令セット リファレンス N-Z 注記 : IA-32 インテル アーキテクチャ ソフトウェア デベロッパーズ マニュアル は 次の 4 巻から構成されています 上巻 : 基本アーキテクチャ ( 資料番号 253665-013J) 中巻 A: 命令セット リファレンス A-M ( 資料番号 253666-013J)
26102 (1/2) LSISoC: (1) (*) (*) GPU SIMD MIMD FPGA DES, AES (2/2) (2) FPGA(8bit) (ISS: Instruction Set Simulator) (3) (4) LSI ECU110100ECU1 ECU ECU ECU ECU FPGA ECU main() { int i, j, k for { } 1 GP-GPU
5 1 2 3 4 5 6 7 8 9 10 11 12 1 132 CMOS Setup Utility - Copyright (C) 1984-2000 Award Software Power Management Setup ACPI Suspend Type S3 (STR) Power Management User Define Video Off Method DPMS Video
/ SCHEDULE /06/07(Tue) / Basic of Programming /06/09(Thu) / Fundamental structures /06/14(Tue) / Memory Management /06/1
I117 II I117 PROGRAMMING PRACTICE II 2 MEMORY MANAGEMENT 2 Research Center for Advanced Computing Infrastructure (RCACI) / Yasuhiro Ohara [email protected] / SCHEDULE 1. 2011/06/07(Tue) / Basic of Programming
void hash1_init(int *array) int i; for (i = 0; i < HASHSIZE; i++) array[i] = EMPTY; /* i EMPTY */ void hash1_insert(int *array, int n) if (n < 0 n >=
II 14 2018 7 26 : : [email protected] 14,, 8 2 12:00 1 O(1) n O(n) O(log n) O(1) 32 : 1G int 4 250 M 2.5 int 21 2 0 100 0 100 #include #define HASHSIZE 100 /* */ #define NOTFOUND 0
Krylov (b) x k+1 := x k + α k p k (c) r k+1 := r k α k Ap k ( := b Ax k+1 ) (d) β k := r k r k 2 2 (e) : r k 2 / r 0 2 < ε R (f) p k+1 :=
127 10 Krylov Krylov (Conjugate-Gradient (CG ), Krylov ) MPIBNCpack 10.1 CG (Conjugate-Gradient CG ) A R n n a 11 a 12 a 1n a 21 a 22 a 2n A T = =... a n1 a n2 a nn n a 11 a 21 a n1 a 12 a 22 a n2 = A...
Intel® Compilers Professional Editions
2007 6 10.0 * 10.0 6 5 Software &Solutions group 10.0 (SV) C++ Fortran OpenMP* OpenMP API / : 200 C/C++ Fortran : OpenMP : : : $ cat -n main.cpp 1 #include 2 int foo(const char *); 3 int main()
PII S (96)
C C R ( 1 Rvw C d m d M.F. Pllps *, P.S. Hp I q G U W C M H P C C f R 5 J 1 6 J 1 A C d w m d u w b b m C d m d T b s b s w b d m d s b s C g u T p d l v w b s d m b b v b b d s d A f b s s s T f p s s
07-二村幸孝・出口大輔.indd
GPU Graphics Processing Units HPC High Performance Computing GPU GPGPU General-Purpose computation on GPU CPU GPU GPU *1 Intel Quad-Core Xeon E5472 3.0 GHz 2 6 MB L2 cache 1600 MHz FSB 80 GFlops 1 nvidia
