GPU Computing on Business

Size: px
Start display at page:

Download "GPU Computing on Business"

Transcription

1 GPU Computing on Business 2010 Numerical Technologies Incorporated 1

2 2

3 3

4 4

5 5

6 6

7 7

8 8

9 9

10 GPU Computing $$$ Revenue Total Cost low BEP Quantity 10

11 11

12 12

13 13

14 14

15 15

16 GPU Computing $$$ Revenue Total Cost high BEP Quantity 16

17 17

18 CUDA C/C++ Perl PHP Python Java C# 18

19 19

20 GPU 20

21 21

22 22

23 23

24 24

25 (NtParallel DLL) NtParallel DLL NVIDIA Tesla driver 25

26 CUDA (NtParallel DLL) 26

27 CUDA (NtParallel DLL) NtParallel DLL NVIDIA Tesla driver 27

28 GPU nt_parallel_for bool nt_parallel_for( for (string), for (int), for 1(void*), for 2(void*), )... 28

29 // Black Scholes Option Formula Batch Processing Demo. void batch_black_scholes_pricer( int array_size, double* o_data, double* r_data, double* sigma_data, double* s_data, double* k_data, double* t_data ) { // for-loop program. for (int i = 0; i < array_size; ++i) { double R = r_data[i]; double Sigma = sigma_data[i]; double S = s_data[i]; double K = k_data[i]; double T = t_data[i]; double rt = R * T; double sigmasqrtt = Sigma * sqrt(t); double d = (log(s / K) + rt) / sigmasqrtt + sigmasqrtt * 0.5; o_data[i] = S * NormDist(d) - K * NormDist(d - sigmasqrtt) * exp(- rt); } // now you have output in o_data. } 29

30 for // Black Scholes Option Formula Batch Processing Demo. void batch_black_scholes_pricer( int array_size, double* o_data, double* r_data, double* sigma_data, double* s_data, double* k_data, double* t_data ) { // for-loop program. for (int i = 0; i < array_size; ++i) { double R = r_data[i]; double Sigma = sigma_data[i]; double S = s_data[i]; double K = k_data[i]; double T = t_data[i]; double rt = R * T; double sigmasqrtt = Sigma * sqrt(t); double d = (log(s / K) + rt) / sigmasqrtt + sigmasqrtt * 0.5; o_data[i] = S * NormDist(d) - K * NormDist(d - sigmasqrtt) * exp(- rt); } // now you have output in o_data. } 30

31 for ( )... for (int i = 0; i < array_size; ++i) { double R = r_data[i]; double Sigma = sigma_data[i]; double S = s_data[i]; double K = k_data[i]; double T = t_data[i]; double rt = R * T; double sigmasqrtt = Sigma * sqrt(t); double d = (log(s / K) + rt) / sigmasqrtt + sigmasqrtt * 0.5; o_data[i] = S * NormDist(d) - K * NormDist(d - sigmasqrtt) * exp(- rt); } string code = [](int i, double R, double Sigma, double S, double K, double T) => double {! double rt = R * T;! double sigmasqrtt = Sigma * sqrt(t);! double d = (log(s / K) + rt) / sigmasqrtt + sigmasqrtt * 0.5;! return S * NormDist(d) - K * NormDist(d - sigmasqrtt) * exp(- rt); } ;... 31

32 // Black Scholes Option Formula Batch Processing Demo. void batch_black_scholes_pricer( int array_size, double* o_data, double* r_data, double* sigma_data, double* s_data, double* k_data, double* t_data ) { // for-loop program. string code = [](int i, double R, double Sigma, double S, double K, double T) => double {! double rt = R * T;! double sigmasqrtt = Sigma * sqrt(t);! double d = (log(s / K) + rt) / sigmasqrtt + sigmasqrtt * 0.5;! return S * NormDist(d) - K * NormDist(d - sigmasqrtt) * exp(- rt); } ; // call the GPU. nt_parallel_for(code, array_size, o_data, r_data, sigma_data, s_data, k_data, t_data); // now you have output in o_data. } 32

33 Excel GPU code is here! 33

34 10 34

35 GPU CPU GPU 35

36 NVIDIA NVIDIA Tesla driver NtParallel DLL Other company s driver SSE, AVX 36

37 Scientific Wall St. 1. Mandelbrot 2. Kirkwood Gaps 3. Wavelet Analysis 4. Binomial Tree Option Model 5. Black Scholes Option Model 6. Housing Loan Calculation Boring Accounting Stuff 37

38 : Mandelbrot 38

39 98 : Kirkwood Gaps 39

40 29 : Wavelet Analysis 40

41 7 : Binomial Tree Option Model 41

42 5 : Black Scholes Option Model 42

43 40 : 35 43

44 44

45 45

46 GPU Computing 46

47 GPU API nvcc 47

48 nt_parallel_for shared nothing CUDA 48

49 49

50 50

51 51

52 Relative Performance vs. for-loop Iterations Acceleration Ratio Iterations 52

53 Relative Performance vs. for-loop Iterations Acceleration Ratio Iterations

54 Relative Performance vs. for-loop Iterations Acceleration Ratio Iterations 54

55 Relative Performance vs. for-loop Iterations Acceleration Ratio Iterations C/C++ C#,Java,Perl,Python GPU 55

56 56

57 for-loop Iterations Tesla GPU

58 Complexity Ops/Bytes 58

59 2 Complexity for-loop Iterations 59

60 Complexity Iterations GPU Complexity x5 x10 x50 faster for-loop Iterations 60

61 2 8 Complexity 10 GPU x5 x10 x50 faster GPU has advantage CPU has advantage for-loop Iterations 61

62 Complexity x5 x10 x50 faster Housing Loan for-loop Iterations 62

63 x10 Complexity x5 x10 x50 faster GPU has advantage Break Even Point Housing Loan CPU has advantage for-loop Iterations 63

64 Complexity x5 x10 x50 faster Binomial Tree for-loop Iterations 64

65 GPU Complexity x5 x10 x50 faster GPU resource exhausted Binomial Tree for-loop Iterations 65

66 Complexity x5 x10 x50 faster Mandelbrot Kirkwood Housing Loan Wavelet Binomial Tree Black Scholes for-loop Iterations 66

67 GPU x5: Black Scholes x40: Housing Loan 67

68 Complexity... string code = [] (int i, double annualizedrate, int term, double monthlypayment) => double {! const int t = term;! const double m = monthlypayment;! const double monthlydf = 1.0 / (1.0 + annualizedrate / 12.0);! int elapsedmonth;! double val = 0.0;! double df = 1.0;! for (elapsedmonth = 0; elapsedmonth < t; elapsedmonth = elapsedmonth + 1) {!! df = df * monthlydf;!! val = val + m * df;! }! return val; } ;... x40: Housing Loan... string code = [](int i, double R, double Sigma, double S, double K, double T) => double {! double rt = R * T;! double sigmasqrtt = Sigma * sqrt(t);! double d = (log(s / K) + rt) / sigmasqrtt + sigmasqrtt * 0.5;! return S * NormDist(d) - K * NormDist(d - sigmasqrtt) * exp(- rt); } ;... x5: Black Scholes 68

69 Q. GPU Complexity Iterations 69

70 A. IFRS ECF Complexity Housing Loan 70

71 Complexity x5 x10 x50 faster Mandelbrot Kirkwood Housing Loan Wavelet Binomial Tree Black Scholes Annuities, IFRS ECF depletion models (estimated) for-loop Iterations 71

72 Iterations x10 GPU Complexity x5 x10 x50 faster GPU has advantage Break Even Point Mandelbrot Kirkwood Housing Loan Wavelet Binomial Tree Black Scholes Annuities, IFRS ECF depletion models CPU has advantage (estimated) for-loop Iterations 72

73 Rule of Thumb... 73

74 nt_parallel_for 74

75 75

76 76

77 77

78 78

79 79

( CUDA CUDA CUDA CUDA ( NVIDIA CUDA I

(    CUDA CUDA CUDA CUDA (  NVIDIA CUDA I GPGPU (II) GPGPU CUDA 1 GPGPU CUDA(CUDA Unified Device Architecture) CUDA NVIDIA GPU *1 C/C++ (nvcc) CUDA NVIDIA GPU GPU CUDA CUDA 1 CUDA CUDA 2 CUDA NVIDIA GPU PC Windows Linux MaxOSX CUDA GPU CUDA NVIDIA

More information

07-二村幸孝・出口大輔.indd

07-二村幸孝・出口大輔.indd GPU Graphics Processing Units HPC High Performance Computing GPU GPGPU General-Purpose computation on GPU CPU GPU GPU *1 Intel Quad-Core Xeon E5472 3.0 GHz 2 6 MB L2 cache 1600 MHz FSB 80 GFlops 1 nvidia

More information

rank ”«‘‚“™z‡Ì GPU ‡É‡æ‡éŁÀŠñ›»

rank ”«‘‚“™z‡Ì GPU ‡É‡æ‡éŁÀŠñ›» rank GPU ERATO 2011 11 1 1 / 26 GPU rank/select wavelet tree balanced parenthesis GPU rank 2 / 26 GPU rank/select wavelet tree balanced parenthesis GPU rank 2 / 26 GPU rank/select wavelet tree balanced

More information

programmingII2019-v01

programmingII2019-v01 II 2019 2Q A 6/11 6/18 6/25 7/2 7/9 7/16 7/23 B 6/12 6/19 6/24 7/3 7/10 7/17 7/24 x = 0 dv(t) dt = g Z t2 t 1 dv(t) dt dt = Z t2 t 1 gdt g v(t 2 ) = v(t 1 ) + g(t 2 t 1 ) v v(t) x g(t 2 t 1 ) t 1 t 2

More information

aisatu.pdf

aisatu.pdf 1 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71

More information

GPGPU

GPGPU GPGPU 2013 1008 2015 1 23 Abstract In recent years, with the advance of microscope technology, the alive cells have been able to observe. On the other hand, from the standpoint of image processing, the

More information

Slides: TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments

Slides: TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments 計算機アーキテクチャ第 11 回 マルチプロセッサ 本資料は授業用です 無断で転載することを禁じます 名古屋大学 大学院情報科学研究科 准教授加藤真平 デスクトップ ジョブレベル並列性 スーパーコンピュータ 並列処理プログラム プログラムの並列化 for (i = 0; i < N; i++) { x[i] = a[i] + b[i]; } プログラムの並列化 x[0] = a[0] + b[0];

More information

haskell.gby

haskell.gby Haskell 1 2 3 Haskell ( ) 4 Haskell Lisper 5 Haskell = Haskell 6 Haskell Haskell... 7 qsort [8,2,5,1] [1,2,5,8] "Hello, " ++ "world!" "Hello, world!" 1 + 2 div 8 2 (+) 1 2 8 div 2 3 4 map even [1,2,3,4]

More information

GPU CUDA CUDA 2010/06/28 1

GPU CUDA CUDA 2010/06/28 1 GPU CUDA CUDA 2010/06/28 1 GPU NVIDIA Mark Harris, Optimizing Parallel Reduction in CUDA http://developer.download.nvidia.com/ compute/cuda/1_1/website/data- Parallel_Algorithms.html#reduction CUDA SDK

More information

2015-s6-4g-pocket-guidebook_H1-4.indd

2015-s6-4g-pocket-guidebook_H1-4.indd 56C504-01 2 47 47 32 3435 35 2124 26 26 26 424343 434446 4646 12 14 16 18 20 4 28 30 31 36 37 38 42 47 48 49 4 4 4 3 4 5 16 16 6 6 18 18 32 32 30 30 7 20 20 8 9 28 31 10 Do you have a? 36 Do you have

More information

Images per Second Images per Second VOLTA: ディープラーニングにおける大きな飛躍 ResNet-50 トレーニング 2.4x faster ResNet-50 推論 TensorRT - 7ms レイテンシ 3.7x faster P100 V100 P10

Images per Second Images per Second VOLTA: ディープラーニングにおける大きな飛躍 ResNet-50 トレーニング 2.4x faster ResNet-50 推論 TensorRT - 7ms レイテンシ 3.7x faster P100 V100 P10 NVIDIA TESLA V100 CUDA 9 のご紹介 森野慎也, シニアソリューションアーキテクト (GPU-Computing) NVIDIA Images per Second Images per Second VOLTA: ディープラーニングにおける大きな飛躍 ResNet-50 トレーニング 2.4x faster ResNet-50 推論 TensorRT - 7ms レイテンシ

More information

SystemC言語概論

SystemC言語概論 SystemC CPU S/W 2004/01/29 4 SystemC 1 SystemC 2.0.1 CPU S/W 3 ISS SystemC Co-Simulation 2004/01/29 4 SystemC 2 ISS SystemC Co-Simulation GenericCPU_Base ( ) GenericCPU_ISS GenericCPU_Prog GenericCPU_CoSim

More information

# let st1 = {name = "Taro Yamada"; id = };; val st1 : student = {name="taro Yamada"; id=123456} { 1 = 1 ;...; n = n } # let string_of_student {n

# let st1 = {name = Taro Yamada; id = };; val st1 : student = {name=taro Yamada; id=123456} { 1 = 1 ;...; n = n } # let string_of_student {n II 6 / : 2001 11 21 (OCaml ) 1 (field) name id type # type student = {name : string; id : int};; type student = { name : string; id : int; } student {} type = { 1 : 1 ;...; n : n } { 1 = 1 ;...; n = n

More information

起業本-入稿.indd

起業本-入稿.indd 78 80 82 84 86 88 1 2 3 4 5 6 90 94 96 98 100 102 104 7 8 9 10 11 12 13 Contents 126 128 1 2 Contents 130 132 134 136 138 140 3 4 5 6 1 142 144 148 150 152 154 2 3 4 5 6 Contents 174 176 180 184 186 1

More information

# let rec sigma (f, n) = # if n = 0 then 0 else f n + sigma (f, n-1);; val sigma : (int -> int) * int -> int = <fun> sigma f n ( : * -> * ) sqsum cbsu

# let rec sigma (f, n) = # if n = 0 then 0 else f n + sigma (f, n-1);; val sigma : (int -> int) * int -> int = <fun> sigma f n ( : * -> * ) sqsum cbsu II 4 : 2001 11 7 keywords: 1 OCaml OCaml (first-class value) (higher-order function) 1.1 1 2 + 2 2 + + n 2 sqsum 1 3 + 2 3 + + n 3 cbsum # let rec sqsum n = # if n = 0 then 0 else n * n + sqsum (n - 1)

More information

1950 1970 1990 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 6,788 7,123 5,384 4,915 4,908 4,927 4,895 4,919 4,936 4,927 4,897 5,010 5,008 1,456 1

1950 1970 1990 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 6,788 7,123 5,384 4,915 4,908 4,927 4,895 4,919 4,936 4,927 4,897 5,010 5,008 1,456 1 1950 1970 1990 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 6,788 7,123 5,384 4,915 4,908 4,927 4,895 4,919 4,936 4,927 4,897 5,010 5,008 1,456 1,616 927 824 826 821 813 808 802 803 801 808 806 18,483

More information

Slide 1

Slide 1 CUDA プログラミングの基本 パート I - ソフトウェアスタックとメモリ管理 CUDA の基本の概要 パート I CUDAのソフトウェアスタックとコンパイル GPUのメモリ管理 パートII カーネルの起動 GPUコードの具体項目 注 : 取り上げているのは基本事項のみです そのほか多数の API 関数についてはプログラミングガイドを ご覧ください CUDA インストレーション CUDA インストレーションの構成

More information

IPSJ SIG Technical Report Vol.2013-HPC-138 No /2/21 GPU CRS 1,a) 2,b) SpMV GPU CRS SpMV GPU NVIDIA Kepler CUDA5.0 Fermi GPU Kepler Kepler Tesla

IPSJ SIG Technical Report Vol.2013-HPC-138 No /2/21 GPU CRS 1,a) 2,b) SpMV GPU CRS SpMV GPU NVIDIA Kepler CUDA5.0 Fermi GPU Kepler Kepler Tesla GPU CRS 1,a),b) SpMV GPU CRS SpMV GPU NVIDIA Kepler CUDA5.0 Fermi GPU Kepler Kepler Tesla K0 CUDA5.0 cusparse CRS SpMV 00 1.86 177 1. SpMV SpMV CRS Compressed Row Storage *1 SpMV GPU GPU NVIDIA Kepler

More information

< F836F A815B934B8D87955C E706466>

< F836F A815B934B8D87955C E706466> 92897 92893 10,000 92894 10,600 92895 11,300 92896 11,900 92897 11,900 92898 15,200 2 92888 92873 4,200 92874 4,200 92875 7,000 92876 7,000 92877 7,000 92878 8,800 92879 8,800 92880 13,300 92881 8,800

More information

A Study on Practical Use of Artificial Intelligence. The purpose of this research paper is to demonstrate the ease of using artificial intelligence in

A Study on Practical Use of Artificial Intelligence. The purpose of this research paper is to demonstrate the ease of using artificial intelligence in A Study on Practical Use of Artificial Intelligence. The purpose of this research paper is to demonstrate the ease of using artificial intelligence in the light of the recent popularity of tertiary artificial

More information

1 OpenCL OpenCL 1 OpenCL GPU ( ) 1 OpenCL Compute Units Elements OpenCL OpenCL SPMD (Single-Program, Multiple-Data) SPMD OpenCL work-item work-group N

1 OpenCL OpenCL 1 OpenCL GPU ( ) 1 OpenCL Compute Units Elements OpenCL OpenCL SPMD (Single-Program, Multiple-Data) SPMD OpenCL work-item work-group N GPU 1 1 2 1, 3 2, 3 (Graphics Unit: GPU) GPU GPU GPU Evaluation of GPU Computing Based on An Automatic Program Generation Technology Makoto Sugawara, 1 Katsuto Sato, 1 Kazuhiko Komatsu, 2 Hiroyuki Takizawa

More information

! 行行 CPUDSP PPESPECell/B.E. CPUGPU 行行 SIMD [SSE, AltiVec] 用 HPC CPUDSP PPESPE (Cell/B.E.) SPE CPUGPU GPU CPU DSP DSP PPE SPE SPE CPU DSP SPE 2

! 行行 CPUDSP PPESPECell/B.E. CPUGPU 行行 SIMD [SSE, AltiVec] 用 HPC CPUDSP PPESPE (Cell/B.E.) SPE CPUGPU GPU CPU DSP DSP PPE SPE SPE CPU DSP SPE 2 ! OpenCL [Open Computing Language] 言 [OpenCL C 言 ] CPU, GPU, Cell/B.E.,DSP 言 行行 [OpenCL Runtime] OpenCL C 言 API Khronos OpenCL Working Group AMD Broadcom Blizzard Apple ARM Codeplay Electronic Arts Freescale

More information

DO 時間積分 START 反変速度の計算 contravariant_velocity 移流項の計算 advection_adams_bashforth_2nd DO implicit loop( 陰解法 ) 速度勾配, 温度勾配の計算 gradient_cell_center_surface 速

DO 時間積分 START 反変速度の計算 contravariant_velocity 移流項の計算 advection_adams_bashforth_2nd DO implicit loop( 陰解法 ) 速度勾配, 温度勾配の計算 gradient_cell_center_surface 速 1 1, 2 1, 2 3 2, 3 4 GP LES ASUCA LES NVIDIA CUDA LES 1. Graphics Processing Unit GP General-Purpose SIMT Single Instruction Multiple Threads 1 2 3 4 1),2) LES Large Eddy Simulation 3) ASUCA 4) LES LES

More information

2017 (413812)

2017 (413812) 2017 (413812) Deep Learning ( NN) 2012 Google ASIC(Application Specific Integrated Circuit: IC) 10 ASIC Deep Learning TPU(Tensor Processing Unit) NN 12 20 30 Abstract Multi-layered neural network(nn) has

More information

JA2008

JA2008 A1 1 10 vs 3 2 1 3 2 0 3 2 10 2 0 0 2 1 0 3 A2 3 11 vs 0 4 4 0 0 0 0 0 3 6 0 1 4 x 11 A3 5 4 vs 5 6 5 1 0 0 3 0 4 6 0 0 1 0 4 5 A4 7 11 vs 2 8 8 2 0 0 0 0 2 7 2 7 0 2 x 11 A5 9 5 vs 3 10 9 4 0 1 0 0 5

More information

untitled

untitled II yacc 005 : 1, 1 1 1 %{ int lineno=0; 3 int wordno=0; 4 int charno=0; 5 6 %} 7 8 %% 9 [ \t]+ { charno+=strlen(yytext); } 10 "\n" { lineno++; charno++; } 11 [^ \t\n]+ { wordno++; charno+=strlen(yytext);}

More information

Ruby Ruby ruby Ruby G: Ruby>ruby Ks sample1.rb G: Ruby> irb (interactive Ruby) G: Ruby>irb -Ks irb(main):001:0> print( ) 44=>

Ruby Ruby ruby Ruby G: Ruby>ruby Ks sample1.rb G: Ruby> irb (interactive Ruby) G: Ruby>irb -Ks irb(main):001:0> print( ) 44=> Ruby Ruby 200779 ruby Ruby G: Ruby>ruby Ks sample1.rb G: Ruby> irb (interactive Ruby) G: Ruby>irb -Ks irb(main):001:0> print( 2+3+4+5+6+7+8+9 ) 44 irb(main):002:0> irb irb(main):001:0> 1+2+3+4 => 10 irb(main):002:0>

More information

<95DB8C9288E397C389C88A E696E6462>

<95DB8C9288E397C389C88A E696E6462> 2011 Vol.60 No.2 p.138 147 Performance of the Japanese long-term care benefit: An International comparison based on OECD health data Mie MORIKAWA[1] Takako TSUTSUI[2] [1]National Institute of Public Health,

More information

EGunGPU

EGunGPU Super Computing in Accelerator simulations - Electron Gun simulation using GPGPU - K. Ohmi, KEK-Accel Accelerator Physics seminar 2009.11.19 Super computers in KEK HITACHI SR11000 POWER5 16 24GB 16 134GFlops,

More information

1 GPU GPGPU GPU CPU 2 GPU 2007 NVIDIA GPGPU CUDA[3] GPGPU CUDA GPGPU CUDA GPGPU GPU GPU GPU Graphics Processing Unit LSI LSI CPU ( ) DRAM GPU LSI GPU

1 GPU GPGPU GPU CPU 2 GPU 2007 NVIDIA GPGPU CUDA[3] GPGPU CUDA GPGPU CUDA GPGPU GPU GPU GPU Graphics Processing Unit LSI LSI CPU ( ) DRAM GPU LSI GPU GPGPU (I) GPU GPGPU 1 GPU(Graphics Processing Unit) GPU GPGPU(General-Purpose computing on GPUs) GPU GPGPU GPU ( PC ) PC PC GPU PC PC GPU GPU 2008 TSUBAME NVIDIA GPU(Tesla S1070) TOP500 29 [1] 2009 AMD

More information

倍々精度RgemmのnVidia C2050上への実装と応用

倍々精度RgemmのnVidia C2050上への実装と応用 .. maho@riken.jp http://accc.riken.jp/maho/,,, 2011/2/16 1 - : GPU : SDPA-DD 10 1 - Rgemm : 4 (32 ) nvidia C2050, GPU CPU 150, 24GFlops 25 20 GFLOPS 15 10 QuadAdd Cray, QuadMul Sloppy Kernel QuadAdd Cray,

More information

p01.qxd

p01.qxd 2 s 1 1 2 6 2 POINT 23 23 32 15 3 4 s 1 3 2 4 6 2 7003800 1600 1200 45 5 3 11 POINT 2 7003800 7 11 7003800 8 12 9 10 POINT 2003 5 s 45700 3800 5 6 s3 1 POINT POINT 45 2700 3800 7 s 5 8 s3 1 POINT POINT

More information

株主通信:第18期 中間

株主通信:第18期 中間 19 01 02 03 04 290,826 342,459 1,250,678 276,387 601,695 2,128,760 31,096 114,946 193,064 45,455 18,478 10,590 199,810 22,785 2,494 3,400,763 284,979 319,372 1,197,774 422,502 513,081 2,133,357 25,023

More information

1003shinseihin.pdf

1003shinseihin.pdf 1 1 1 2 2 3 4 4 P.14 2 P.5 3 P.620 6 7 8 9 10 11 13 14 18 20 00 P.21 1 1 2 3 4 5 2 6 P7 P14 P13 P11 P14 P13 P11 3 P13 7 8 9 10 Point! Point! 11 12 13 14 Point! Point! 15 16 17 18 19 Point! Point! 20 21

More information

ワタベウェディング株式会社

ワタベウェディング株式会社 1 2 3 4 140,000 100,000 60,000 20,000 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 5 6 71 2 13 14 7 8 9 10 11 12 1 2 2 point 1 point 2 1 1 3 point 3 4 4 5 6 point 4 point 5 point 6 13 14 15 16 point 17

More information

untitled

untitled 1 2 3 4 5 6 7 Point 60,000 50,000 40,000 30,000 20,000 10,000 0 29,979 41,972 31,726 45,468 35,837 37,251 24,000 20,000 16,000 12,000 8,000 4,000 0 16,795 22,071 20,378 14 13 12 11 10 0 12.19 12.43 12.40

More information

株主通信 第16 期 報告書

株主通信 第16 期 報告書 10 15 01 02 1 2 3 03 04 4 05 06 5 153,476 232,822 6,962 19,799 133,362 276,221 344,360 440,112 412,477 846,445 164,935 422,265 1,433,645 26,694 336,206 935,497 352,675 451,321 1,739,493 30,593 48,894 153,612

More information

ヤフー株式会社 株主通信VOL.16

ヤフー株式会社 株主通信VOL.16 01 260,602264,402 122,795125,595 64,84366,493 107110 120,260123,060 0 500 300 400 200 100 700 600 800 39.8% 23.7% 36.6% 26.6% 21.1% 52.4% 545 700 0 50 200 150 100 250 300 350 312 276 151 171 02 03 04 POINT

More information

-- 0 500 1000 1500 2000 2500 3000 () 0% 20% 40% 60%23 47.5% 16.0% 26.8% 27.6% 10,000 -- 350 322 300 286 250 200 150 100 50 0 20 21 22 23 24 25 26 27 28 29 -- ) 300 280 260 240 163,558 165,000 160,000

More information

橡matufw

橡matufw 3 10 25 3 18 42 1 2 6 2001 8 22 3 03 36 3 4 A 2002 2001 1 1 2014 28 26 5 9 1990 2000 2000 12 2000 12 12 12 1999 88 5 2014 60 57 1996 30 25 205 0 4 120 1,5 A 1995 3 1990 30 6 2000 2004 2000 6 7 2001 5 2002

More information

O

O 11 2 1 2 1 1 2 1 80 2 160 3 4 17 257 1 2 1 2 3 3 1 2 138 1 1 170 O 3 5 1 5 6 139 1 A 5 2.5 A 1 A 1 1 3 20 5 A 81 87 67 A 140 11 12 2 1 1 1 12 22 1 10 1 13 A 2 3 2 6 1 B 2 B B B 1 2 B 100 B 10 B 3 3 B 1

More information

Agenda Motivation How it works Performance Limitation Conclusion

Agenda Motivation How it works Performance Limitation Conclusion py2llvm: Python to LLVM translator Syoyo Fujita Agenda Motivation How it works Performance Limitation Conclusion Agenda Motivation How it works Performance Limitation Conclusion py2llvm Python LLVM Python,

More information

1.... 1 2.... 1 2.1. RATS... 1 2.1.1. expat... 1 2.1.2. expat... 1 2.1.3. expat... 2 2.2. RATS... 2 2.2.1. RATS... 2 2.2.2.... 3 3. RATS... 4 3.1.... 4 3.2.... 4 3.3.... 6 3.3.1.... 6 3.3.2.... 6 3.3.3....

More information

,4) 1 P% P%P=2.5 5%!%! (1) = (2) l l Figure 1 A compilation flow of the proposing sampling based architecture simulation

,4) 1 P% P%P=2.5 5%!%! (1) = (2) l l Figure 1 A compilation flow of the proposing sampling based architecture simulation 1 1 1 1 SPEC CPU 2000 EQUAKE 1.6 50 500 A Parallelizing Compiler Cooperative Multicore Architecture Simulator with Changeover Mechanism of Simulation Modes GAKUHO TAGUCHI 1 YOUICHI ABE 1 KEIJI KIMURA 1

More information

CudaWaveField

CudaWaveField CudaWaveField 2012 3 22 2 CudaWaveField Rel 1.0.0 Rel 1.0 CudaWaveField ( cwfl) / cwfl cwfl http://www.laser.ee.kansai-u.ac.jp/wavefieldtools Note Acrobat Reader 3 I CudaWaveField 9 1 11 1.1 CudaWaveField......................

More information

GPU GPU CPU CPU CPU GPU GPU N N CPU ( ) 1 GPU CPU GPU 2D 3D CPU GPU GPU GPGPU GPGPU 2 nvidia GPU CUDA 3 GPU 3.1 GPU Core 1

GPU GPU CPU CPU CPU GPU GPU N N CPU ( ) 1 GPU CPU GPU 2D 3D CPU GPU GPU GPGPU GPGPU 2 nvidia GPU CUDA 3 GPU 3.1 GPU Core 1 GPU 4 2010 8 28 1 GPU CPU CPU CPU GPU GPU N N CPU ( ) 1 GPU CPU GPU 2D 3D CPU GPU GPU GPGPU GPGPU 2 nvidia GPU CUDA 3 GPU 3.1 GPU Core 1 Register & Shared Memory ( ) CPU CPU(Intel Core i7 965) GPU(Tesla

More information

Microsoft PowerPoint - GPU_computing_2013_01.pptx

Microsoft PowerPoint - GPU_computing_2013_01.pptx GPU コンピューティン No.1 導入 東京工業大学 学術国際情報センター 青木尊之 1 GPU とは 2 GPGPU (General-purpose computing on graphics processing units) GPU を画像処理以外の一般的計算に使う GPU の魅力 高性能 : ハイエンド GPU はピーク 4 TFLOPS 超 手軽さ : 普通の PC にも装着できる 低価格

More information

: : : TSTank 2

: : : TSTank 2 Java (8) 2008-05-20 Lesson6 Lesson5 Java 1 Lesson 6: TSTank1, TSTank2, TSTank3 java 2 car1 car2 Car car1 = new Car(); Car car2 = new Car(); car1.setcolor(red); car2.setcolor(blue); car2.changeengine(jet);

More information

デリバティブにおけるオプション

デリバティブにおけるオプション Financial Derivative Product Options: A Simulation TAKABAYASHI Shigeki As the reform of the Japanese financial system proceeds, computers are increasingly used to deal with the many complicated financial

More information

スパコンに通じる並列プログラミングの基礎

スパコンに通じる並列プログラミングの基礎 2018.09.10 furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 1 / 59 furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 2 / 59 Windows, Mac Unix 0444-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 3 / 59 Part I Unix GUI CUI:

More information

スパコンに通じる並列プログラミングの基礎

スパコンに通じる並列プログラミングの基礎 2018.06.04 2018.06.04 1 / 62 2018.06.04 2 / 62 Windows, Mac Unix 0444-J 2018.06.04 3 / 62 Part I Unix GUI CUI: Unix, Windows, Mac OS Part II 2018.06.04 4 / 62 0444-J ( : ) 6 4 ( ) 6 5 * 6 19 SX-ACE * 6

More information

AMD/ATI Radeon HD 5870 GPU DEGIMA LINPACK HD 5870 GPU DEGIMA LINPACK GFlops/Watt GFlops/Watt Abstract GPU Computing has lately attracted

AMD/ATI Radeon HD 5870 GPU DEGIMA LINPACK HD 5870 GPU DEGIMA LINPACK GFlops/Watt GFlops/Watt Abstract GPU Computing has lately attracted DEGIMA LINPACK Energy Performance for LINPACK Benchmark on DEGIMA 1 AMD/ATI Radeon HD 5870 GPU DEGIMA LINPACK HD 5870 GPU DEGIMA LINPACK 1.4698 GFlops/Watt 1.9658 GFlops/Watt Abstract GPU Computing has

More information

IPSJ SIG Technical Report Vol.2013-ARC-203 No /2/1 SMYLE OpenCL (NEDO) IT FPGA SMYLEref SMYLE OpenCL SMYLE OpenCL FPGA 1

IPSJ SIG Technical Report Vol.2013-ARC-203 No /2/1 SMYLE OpenCL (NEDO) IT FPGA SMYLEref SMYLE OpenCL SMYLE OpenCL FPGA 1 SMYLE OpenCL 128 1 1 1 1 1 2 2 3 3 3 (NEDO) IT FPGA SMYLEref SMYLE OpenCL SMYLE OpenCL FPGA 128 SMYLEref SMYLE OpenCL SMYLE OpenCL Implementation and Evaluations on 128 Cores Takuji Hieda 1 Noriko Etani

More information

OpenMP (1) 1, 12 1 UNIX (FUJITSU GP7000F model 900), 13 1 (COMPAQ GS320) FUJITSU VPP5000/64 1 (a) (b) 1: ( 1(a))

OpenMP (1) 1, 12 1 UNIX (FUJITSU GP7000F model 900), 13 1 (COMPAQ GS320) FUJITSU VPP5000/64 1 (a) (b) 1: ( 1(a)) OpenMP (1) 1, 12 1 UNIX (FUJITSU GP7000F model 900), 13 1 (COMPAQ GS320) FUJITSU VPP5000/64 1 (a) (b) 1: ( 1(a)) E-mail: {nanri,amano}@cc.kyushu-u.ac.jp 1 ( ) 1. VPP Fortran[6] HPF[3] VPP Fortran 2. MPI[5]

More information