メモリ階層構造を考慮した大規模グラフ処理の高速化

Size: px
Start display at page:

Download "メモリ階層構造を考慮した大規模グラフ処理の高速化"

Transcription

1 , CREST ERATO 0.. (, CREST) ERATO / 8

2 Outline NETAL (NETwork Analysis Library) NUMA BFS raph500, reenraph500 Kronecker raph Level Synchronized parallel BFS Hybrid Algorithm for Parallel BFS NUMA Hybrid Parallel BFS 5th raph500 / (st) reenraph500 (, CREST) ERATO / 8

3 ,,,,,,,,,... raphct : BC (Betweenness, ) USA-road-d.LKS.gr (n =.76M, m = 6.89M) :0.6 cit-patents (n =.77M, m = 6.5M) :.6 (CRAY XMT) NETAL {, } CC, C, SC, BC USA-road-d.LKS.gr (n =.76M, m = 6.89M) :9.4 cit-patents (n =.77M, m = 6.5M) :.5 NUMA Intel/AMD (, CREST) ERATO / 8

4 Brandes algorithm* closeness (CC) C C (v)= t V d (v,t) graph (C) C (v)= max t V d (v,t) stress (SC) C S (v)= s v t V σ st (v) betweenness (BC) σ st (v) C B (v)= s v t V σ st multipathbfs multipathsssp 6 : ShortestPath phase 8-0 : UPdate phase *U.Brandes,AFasterAlgorithmforBetweennessCentrality,(00) : =(V,E) : C C [v], C [v], C S [v], C B [v], v V (0 ) : for s V parallel do : /* σ[v] v */ : /* S */ 4: /* P[v] v */ 5: /* d [v] v */ 6: σ,s,p,d multipathbfs(,s) 7: 8: C C [s] t V d (s,t) 9: C [s] max t V d (s,t) 0: while S /0 do : pop w S : for v P[w] do : δ S [v] ( + δ S [w]) 4: δ B [v] δ B [v]+ σ[v] σ[w] ( + δ B[w]) 5: end for 6: if w s then 7: C S [w] C S [w]+σ[w] δ S [w] 8: C B [w] C B [w]+δ B [w] 9: end if 0: end while : end for (, CREST) ERATO 4 / 8

5 =(V,E), n = V, m = E, l : E R + n SSSP ( : BFS) β n MSSP (SSSP β) n n APSP (,, ) ( : SSSP) distancesssp distance singlepathsssp distance multipathsssp distance singlepathsssp (, CREST) ERATO 5 / 8

6 -HEAP: Dijkstra s algorithm,.., -HEAP. 9th DIMACS MLB,. singlepathsssp, n =.95M m = 58.M -way Xeon X5460.6Hz (4 cores ) SSSP [ms] CPU time (speedup) [B] -HEAP* (sequential) (.00) HEAP* ( threads) 74.4 (.94).7 -HEAP* ( 4 threads) 45. (.65).00 -HEAP* ( 8 threads) 0.7 ( 5.4).46 MLB (sequential) *,,, : (, CREST) ERATO 6 / 8

7 , bottleneck diff-procs diff-l same-l processor bandwidth down down down processor inside bandwidth - down down L cache sharing - - down Arithmetic performance different processors same processor, different L caches same processor, same L cache diff-procs : diff-l : L same-l : L -way Xeon X5460 SSSP (USA-road-d.USA.gr) sequential diff-procs diff-l same-l -HEAP* 5.4 s (± 0.00%) 5.44 s (-.9%) 5.6 s (- 5.05%) 6.6 s (-8.94%) d-heap 7. s (± 0.00%) 7.6 s (- 0.4%) 7.59 s (- 4.74%) 8.79 s (-7.75%) Fib-heap 5.95 s (± 0.00%) 6.09 s (- 0.87%) 6.56 s (-.68%) 8.7 s (-.%) Dial s 4.8 s (± 0.00%) 4.54 s (-.5%) 5.0 s (-.58%) 6.8 s (-.5%) double buckets 4.65 s (± 0.00%) 4.88 s (- 4.7%) 5.5 s (-.4%) 6.64 s (-9.97%) MLB 5.69 s (± 0.00%) 5.85 s (-.74%) 6.7 s (- 7.78%) 7.7 s (-6.9%) -stepping.74 s (± 0.00%).06 s (-.66%).55 s (- 6.4%) 6.49 s (-8.76%) *,,, : (, CREST) ERATO 7 / 8

8 NETAL (NETwork Analysis Library) APSP (CC,C,SC,BC) NUMA (CPU ) APSP (APSP) n-bfs BFS multipathbfs n n-dijkstra Dijkstra s with binary-heap multipathsssp n n/β-mlsc MLSC with binary-heap distancemssp n/β -HEAP n-bfs BFS CC,C,SC,BC n-dijkstra Dijkstra s with binary-heap CC,C,SC,BC 4 Centrality CC, C, SC, BC multipath (unweighted) weighted CC, C, SC, BC multipath in parallel NETAL (NETwork Analysis Library) APSP distance n-bfs n-dijkstra n/b-mslc Y.Yasui et al.: NETAL:High-Performance Implementation of Network Analysis Library Considering Computer Memory Hierarchy, 0. (, CREST) ERATO 8 / 8

9 NUMA 4-way opteron 674 ( cores 4sockets) APSP CPU ( ).. affinity raph Data CPU/Memory affinity worst: 48 -affinity raph Data CPU/Memory affinity best: 6 8-affinity USA-road-d.NY.gr (n = 64K, m = 74K) :TEPS (speedup) affinity n-bfs n-dijkstra n/β-mslc(β = ) sequential 0.5 M (.0) 0.8 M (.0).4 M (.0) threads worst 9.4 M (.7) 0. M (.) 6.6 M ( 9.) best 99. M( 4.6) 4. M(.4) 40.9 M(.7) 4 threads worst 87.8 M ( 8.9).0 M ( 9.7) M ( 6.8) best M( 6.8) 49.8 M(.) 64.0 M(.5) 48 threads worst M ( 7.5) 5.0 M (.6) M (.) best M( 46.) 47.7 M( 4.6) 47.5 M( 5.7) (, CREST) ERATO 9 / 8

10 4-way opteron 674 ( cores 4sockets) afffnity affinity CPU/Memory affinity raph Data worst: 48 -affinity raph Data CPU/Memory affinity best: 6 8-affinity CPU time [seconds] trials n-bfs (best) n-sssp (best) n/β-mssp (best) n-bfs n-sssp n/β-mssp 0 affinity (best) n-bfs (worst) n-sssp (worst) n/β-mssp (worst) affinity (worst) (, CREST) ERATO 0 / 8

11 (APSP) USA-road-d.USA.gr n =.95M, m = 58.M.5 9. distanceapsp, n/β-mslc 7.75 (MLB 9, -stepping 4 ) n-dijkstra (multipathsssp) MLB, -stepping 8 4 LiveJournal soc-livejournal n = 4.85M, m = 68.99M distanceapsp, n/β-mslc.78 (MLB 4, -stepping 04 ) n-dijkstra (multipathsssp) MLB, -stepping 0 USA-road-d.USA.gr soc-livejournal n-bfs 70 days 7.5 days n-dijkstra 99 days 9.6 days n/β-mslc 7.75 days (β = 6).78 days (β = ) LS-BFS 557 days (=.5 years) 79.5 days MLB 774 days (= 4.9 years) 0.55 days -stepping 5 days (= 9. years) 88. days (, CREST) ERATO / 8

12 中心性指標 (USA-road-d.LKS.gr (n =.76M, m = 6.89M)) NETAL は 4 種類の中心性指標 CC,C,CS,CB を計算する 重みなし中心性 (上段, n-bfs で 9.4 時間), 重み付中心性 (下段, n-dijkstra で.8 時間) raphct は枝長を考慮しない BC のみに 0.6 日間要する (4-way Opteron 674) closeness CC (v) = graph C (v) = maxt V d (v,t) t V d (v,t) stress CS (v) = 安井 (中央大学, CREST) s!v!t V σst (v) betweenness CB (v) = メモリ階層構造を考慮した大規模グラフ処理の高速化 s!v!t V σst (v) σst ERATO / 8

13 n-bfs n-dijkstra,, / C C,C,C S,C B raphct raphct, C B NETAL(n-BFS) 4, raphct 6 NETAL(n-SSSP), NETAL(n-BFS).. instance n m n-bfs n-dijkstra raphct (C C,C,C S,C B ) (weighted C C,C,C S,C B ) (C B ) USA-road-d.LKS.gr.8M 6.9M 9.5 h (SP 55 %, UP 45 %).84 h (SP 69 %, UP %) 49.8 h cit-patents.8m 6.5M.87 h (SP 7 %, UP 7 %).5 h (SP 40 %, UP 60 %).6 h (SP) (UP) SSCA# SSCA# C B ( ) SSCA#, n-bfs.8, n-dijkstra.4 raphct instance n m n-bfs n-dijkstra raphct SSCA# (C C,C,C S,C B ) (weighted C C,C,C S,C B ) (C B ) (C B ) R-MAT n = 6.78M m= 4.M 6.0 seconds 60.5 seconds 60.0 seconds error 0.8 MTEPS.9 MTEPS 48.5 MTEPS (, CREST) ERATO / 8

14 raph500, reenraph500 raph500 BFS TEPS ratio Traversed edges per second Kronecker raph 64 BFS. SCALE, edgefactor(= 6), n = SCALE, m = edgefactor n. BFS, BFS, TEPS., 64 Medial TEPS,. reenraph500 BFS TEPS/kW, BFS energy loop. RemotePDU: Omron RC008 raph& enera)on raph& Construc)on BFS Valida)on (, CREST) ERATO 4 / 8

15 Kronecker raph Kronecker raph SCALE kronecker. SCALE =... } {{ } SCALE raph500, = ( ) , Kronecker raph SCALE 6, edgefactor 6 n = 6 = 67.M m = = 47.5M number of nodes node degree (, CREST) ERATO 5 / 8

16 Level Synchronized parallel BFS Level Synchronized parallel BFS BFS., atomic.. (, CREST) ERATO 6 / 8

17 Hybrid Algorithm for Parallel BFS [Beamer,0] Direction Optimizing Breadth-First Search (frontier),. forward-search (top-down step) backward-search (bottom-up step) (, CREST) ERATO 7 / 8

18 Hybrid Algorithm Top-down Bottom-up, Top-down Bottom-up,, Bottom-up Top-down,, m f m u n f n (, CREST) ERATO 8 / 8

19 NUMA Hybrid Algorithm Intel(R) Xeon(R) CPU (HT ) (α = 0,β = 4) TEPS (SCALE=6) SCALE TEPS (#threads=) Traversed Edges Per Second (TEPS).0e+0.5e+0.0e+0.5e+0.0e+0 5.0e+09 edgefactor= 8 edgefactor=6 edgefactor= edgefactor=64 Traversed Edges Per Second (TEPS).0e+0.5e+0.0e+0.5e+0.0e+0 5.0e+09 edgefactor= 8 edgefactor=6 edgefactor= edgefactor=64 0.0e #threads 0.0e scale (, CREST) ERATO 4 / 8

20 raph500(/(reenraph500 SCALE n m / ( BFS( ( ( (TEPS) ( ) (W) TEPS/kW WestmreEX 80 ( Xeon(E7@(4870(@(.40Hz((0(cores)(x(4 SandyBridgeEP ( Xeon(E5@690(@(.90Hz((6(cores)(x( MagnyCours((48)( Opteron(674(@(.0Hz(((cores)(x(4 SandyBridgeEP ( Xeon(E5@690(@(.00Hz((6(cores)(x( Xeon(E5@60(@(.0Hz(((threads)(x(( WestmereEP((4)( Xeon(X5670(@(.9Hz(((cores)(x(( Core(i7@80QM(@(.70Hz(((cores)( Core(i7@80QM(@(.70Hz(((cores)( ( ( ( ( ( ( ( ( ( ( ( ( ( ( (, CREST) ERATO 5 / 8

21 raph500(/(reenraph500 CPU( (node( SCALE n m / ( BFS( ( ( (TEPS) ( ) (W) TEPS/kW WestmreEX 80 ( Xeon(E7@(4870(@(.40Hz((0(cores)(x(4 SandyBridgeEP ( Xeon(E5@690(@(.90Hz((6(cores)(x( MagnyCours((48)( Opteron(674(@(.0Hz(((cores)(x(4 SandyBridgeEP ( Xeon(E5@690(@(.00Hz((6(cores)(x( Xeon(E5@60(@(.0Hz(((threads)(x(( WestmereEP((4)( Xeon(X5670(@(.9Hz(((cores)(x(( Core(i7@80QM(@(.70Hz(((cores)( Core(i7@80QM(@(.70Hz(((cores)( ( ( ( ( ( ( ( ( ( ( ( ( ( ( (, CREST) ERATO 6 / 8

22 st raphreen500-list raphcrest- - -ISC- - raph :-raphcrest8tegra-/-scale0(/(0.99(teps( ASUS-Pad-TF700T-(NVIDIA-Tegra--.7Hz,-4-cores)-/--node- (, CREST) ERATO 7 / 8

23 NETAL (NETwork Analysis Library) NUMA Intel/AMD.5 9., 7.75 CC, C, SC, BC : 9.4 ( : 0.6 ) CC, C, SC, BC :.5 ( :.6 ) raph500 Kronecker raph BFS. (small-world scale-free ) CPU node HT 80, 0.90 TEPS reenraph500, 5 (, CREST) ERATO 8 / 8

untitled

untitled c NUMA 1. 18 (Moore s law) 1Hz CPU 2. 1 (Register) (RAM) Level 1 (L1) L2 L3 L4 TLB (translation look-aside buffer) (OS) TLB TLB 3. NUMA NUMA (Non-uniform memory access) 819 0395 744 1 2014 10 Copyright

More information

人工知能学会研究会資料 SIG-FPAI-B Predicting stock returns based on the time lag in information diffusion through supply chain networks 1 1 Yukinobu HA

人工知能学会研究会資料 SIG-FPAI-B Predicting stock returns based on the time lag in information diffusion through supply chain networks 1 1 Yukinobu HA 人工知能学会研究会資料 SIG-FPAI-B508-08 - - Predicting stock returns based on the time lag in information diffusion through supply chain networks 1 1 Yukinobu HAMURO 1 Katsuhiko OKADA 1 1 1 Kwansei Gakuin University

More information

soturon.dvi

soturon.dvi 12 Exploration Method of Various Routes with Genetic Algorithm 1010369 2001 2 5 ( Genetic Algorithm: GA ) GA 2 3 Dijkstra Dijkstra i Abstract Exploration Method of Various Routes with Genetic Algorithm

More information

HPC (pay-as-you-go) HPC Web 2

HPC (pay-as-you-go) HPC Web 2 ,, 1 HPC (pay-as-you-go) HPC Web 2 HPC Amazon EC2 OpenFOAM GPU EC2 3 HPC MPI MPI Courant 1 GPGPU MPI 4 AMAZON EC2 GPU CLUSTER COMPUTE INSTANCE EC2 GPU (cg1.4xlarge) ( N. Virgina ) Quadcore Intel Xeon 5570

More information

i

i 24 19 19115096 i 1 1 2 2 2.1..................................... 2 2.2....................... 3 2.3................................... 3 2.3.1.................. 4 2.4............................... 4

More information

GPU n Graphics Processing Unit CG CAD

GPU n Graphics Processing Unit CG CAD GPU 2016/06/27 第 20 回 GPU コンピューティング講習会 ( 東京工業大学 ) 1 GPU n Graphics Processing Unit CG CAD www.nvidia.co.jp www.autodesk.co.jp www.pixar.com GPU n GPU ü n NVIDIA CUDA ü NVIDIA GPU ü OS Linux, Windows, Mac

More information

untitled

untitled A = QΛQ T A n n Λ Q A = XΛX 1 A n n Λ X GPGPU A 3 T Q T AQ = T (Q: ) T u i = λ i u i T {λ i } {u i } QR MR 3 v i = Q u i A {v i } A n = 9000 Quad Core Xeon 2 LAPACK (4/3) n 3 O(n 2 ) O(n 3 ) A {v i }

More information

1 911 9001030 9:00 A B C D E F G H I J K L M 1A0900 1B0900 1C0900 1D0900 1E0900 1F0900 1G0900 1H0900 1I0900 1J0900 1K0900 1L0900 1M0900 9:15 1A0915 1B0915 1C0915 1D0915 1E0915 1F0915 1G0915 1H0915 1I0915

More information

CPU Levels in the memory hierarchy Level 1 Level 2... Increasing distance from the CPU in access time Level n Size of the memory at each level 1: 2.2

CPU Levels in the memory hierarchy Level 1 Level 2... Increasing distance from the CPU in access time Level n Size of the memory at each level 1: 2.2 FFT 1 Fourier fast Fourier transform FFT FFT FFT 1 FFT FFT 2 Fourier 2.1 Fourier FFT Fourier discrete Fourier transform DFT DFT n 1 y k = j=0 x j ω jk n, 0 k n 1 (1) x j y k ω n = e 2πi/n i = 1 (1) n DFT

More information

( 9 1 ) 1 2 1.1................................... 2 1.2................................................. 3 1.3............................................... 4 1.4...........................................

More information

untitled

untitled A = QΛQ T A n n Λ Q A = XΛX 1 A n n Λ X GPGPU A 3 T Q T AQ = T (Q: ) T u i = λ i u i T {λ i } {u i } QR MR 3 v i = Q u i A {v i } A n = 9000 Quad Core Xeon 2 LAPACK (4/3) n 3 O(n 2 ) O(n 3 ) A {v i }

More information

3 4 3 2 4 1 4 2 4 2 1 3 1 1 4 1 1 16,000 14,000 12,000 W) S) RC) CB 10,000 8,000 6,000 4,000 2,000 0 12,000 11,500 11,000 10,500 10,000 9,500 9,000 550 540 530 520 510 500 490 480 470 460 450 2008 2009

More information

untitled

untitled - - GRIPS 1 traceroute IP Autonomous System Level http://opte.org/ GRIPS 2 Network Science http://opte.org http://research.lumeta.com/ches/map http://www.caida.org/home http://www.imdb.com http://citeseer.ist.psu.edu

More information

~~~~~~~~~~~~~~~~~~ wait Call CPU time 1, latch: library cache 7, latch: library cache lock 4, job scheduler co

~~~~~~~~~~~~~~~~~~ wait Call CPU time 1, latch: library cache 7, latch: library cache lock 4, job scheduler co 072 DB Magazine 2007 September ~~~~~~~~~~~~~~~~~~ wait Call CPU time 1,055 34.7 latch: library cache 7,278 750 103 24.7 latch: library cache lock 4,194 465 111 15.3 job scheduler coordinator slave wait

More information

FabHetero FabHetero FabHetero FabCache FabCache SPEC2000INT IPC FabCache 0.076%

FabHetero FabHetero FabHetero FabCache FabCache SPEC2000INT IPC FabCache 0.076% 2013 (409812) FabHetero FabHetero FabHetero FabCache FabCache SPEC2000INT 6 1000 IPC FabCache 0.076% Abstract Single-ISA heterogeneous multi-core processors are increasing importance in the processor architecture.

More information

Slides: TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments

Slides: TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments 計算機アーキテクチャ第 11 回 マルチプロセッサ 本資料は授業用です 無断で転載することを禁じます 名古屋大学 大学院情報科学研究科 准教授加藤真平 デスクトップ ジョブレベル並列性 スーパーコンピュータ 並列処理プログラム プログラムの並列化 for (i = 0; i < N; i++) { x[i] = a[i] + b[i]; } プログラムの並列化 x[0] = a[0] + b[0];

More information

2 HI LO ZDD 2 ZDD 2 HI LO 2 ( ) HI (Zero-suppress ) Zero-suppress ZDD ZDD Zero-suppress 1 ZDD abc a HI b c b Zero-suppress b ZDD ZDD 5) ZDD F 1 F = a

2 HI LO ZDD 2 ZDD 2 HI LO 2 ( ) HI (Zero-suppress ) Zero-suppress ZDD ZDD Zero-suppress 1 ZDD abc a HI b c b Zero-suppress b ZDD ZDD 5) ZDD F 1 F = a ZDD 1, 2 1, 2 1, 2 2 2, 1 #P- Knuth ZDD (Zero-suppressed Binary Decision Diagram) 2 ZDD ZDD ZDD Knuth Knuth ZDD ZDD Path Enumeration Algorithms Using ZDD and Their Performance Evaluations Toshiki Saitoh,

More information

I I / 47

I I / 47 1 2013.07.18 1 I 2013 3 I 2013.07.18 1 / 47 A Flat MPI B 1 2 C: 2 I 2013.07.18 2 / 47 I 2013.07.18 3 / 47 #PJM -L "rscgrp=small" π-computer small: 12 large: 84 school: 24 84 16 = 1344 small school small

More information

1. 2. (Rowthorn, 2014) / 39 1

1. 2. (Rowthorn, 2014) / 39 1 ,, 43 ( ) 2015 7 18 ( ) E-mail: sasaki@econ.kyoto-u.ac.jp 1 / 39 1. 2. (Rowthorn, 2014) 3. 4. 5. 6. 7. 2 / 39 1 ( 1). ( 2). = +. 1. g. r. r > g ( 3).. 3 / 39 2 50% Figure I.1. Income inequality in the

More information

4.1 % 7.5 %

4.1 % 7.5 % 2018 (412837) 4.1 % 7.5 % Abstract Recently, various methods for improving computial performance have been proposed. One of these various methods is Multi-core. Multi-core can execute processes in parallel

More information

Microsoft PowerPoint - GPU_computing_2013_01.pptx

Microsoft PowerPoint - GPU_computing_2013_01.pptx GPU コンピューティン No.1 導入 東京工業大学 学術国際情報センター 青木尊之 1 GPU とは 2 GPGPU (General-purpose computing on graphics processing units) GPU を画像処理以外の一般的計算に使う GPU の魅力 高性能 : ハイエンド GPU はピーク 4 TFLOPS 超 手軽さ : 普通の PC にも装着できる 低価格

More information

GPU CUDA CUDA 2010/06/28 1

GPU CUDA CUDA 2010/06/28 1 GPU CUDA CUDA 2010/06/28 1 GPU NVIDIA Mark Harris, Optimizing Parallel Reduction in CUDA http://developer.download.nvidia.com/ compute/cuda/1_1/website/data- Parallel_Algorithms.html#reduction CUDA SDK

More information

y = x 4 y = x 8 3 y = x 4 y = x 3. 4 f(x) = x y = f(x) 4 x =,, 3, 4, 5 5 f(x) f() = f() = 3 f(3) = 3 4 f(4) = 4 *3 S S = f() + f() + f(3) + f(4) () *4

y = x 4 y = x 8 3 y = x 4 y = x 3. 4 f(x) = x y = f(x) 4 x =,, 3, 4, 5 5 f(x) f() = f() = 3 f(3) = 3 4 f(4) = 4 *3 S S = f() + f() + f(3) + f(4) () *4 Simpson H4 BioS. Simpson 3 3 0 x. β α (β α)3 (x α)(x β)dx = () * * x * * ɛ δ y = x 4 y = x 8 3 y = x 4 y = x 3. 4 f(x) = x y = f(x) 4 x =,, 3, 4, 5 5 f(x) f() = f() = 3 f(3) = 3 4 f(4) = 4 *3 S S = f()

More information

untitled

untitled Ver. 1.0 1...1 1.1 Feature Pack...1 1.2...2 2...8 2.1...8 2.2...9 3...11 3.1... 11 3.2... 12 3.3... 14 3.4... 15 4 Appendix...16 LoadSimulator 2003... 16... 20 Windows Storage Server 2003 Feature Featurepack.Doc

More information

it-ken_open.key

it-ken_open.key 深層学習技術の進展 ImageNet Classification 画像認識 音声認識 自然言語処理 機械翻訳 深層学習技術は これらの分野において 特に圧倒的な強みを見せている Figure (Left) Eight ILSVRC-2010 test Deep images and the cited4: from: ``ImageNet Classification with Networks et

More information

第29回日中石炭関係総合会議

第29回日中石炭関係総合会議 1 2 3 4 5 6 闞 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 闞 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69

More information

01_OpenMP_osx.indd

01_OpenMP_osx.indd OpenMP* / 1 1... 2 2... 3 3... 5 4... 7 5... 9 5.1... 9 5.2 OpenMP* API... 13 6... 17 7... 19 / 4 1 2 C/C++ OpenMP* 3 Fortran OpenMP* 4 PC 1 1 9.0 Linux* Windows* Xeon Itanium OS 1 2 2 WEB OS OS OS 1 OS

More information

2005 2006.2.22-1 - 1 Fig. 1 2005 2006.2.22-2 - Element-Free Galerkin Method (EFGM) Meshless Local Petrov-Galerkin Method (MLPGM) 2005 2006.2.22-3 - 2 MLS u h (x) 1 p T (x) = [1, x, y]. (1) φ(x) 0.5 φ(x)

More information

フカシギおねえさん問題の高速計算アルゴリズム

フカシギおねえさん問題の高速計算アルゴリズム JST ERATO 2013/7/26 Joint work with 1 / 37 1 2 3 4 5 6 2 / 37 1 2 3 4 5 6 3 / 37 : 4 / 37 9 9 6 10 10 25 5 / 37 9 9 6 10 10 25 Bousquet-Mélou (2005) 19 19 3 1GHz Alpha 8 Iwashita (Sep 2012) 21 21 3 2.67GHz

More information

16.16%

16.16% 2017 (411824) 16.16% Abstract Multi-core processor is common technique for high computing performance. In many multi-core processor architectures, all processors share L2 and last level cache memory. Thus,

More information

1 SHIMURA Masato polynomial irr.xirr EXCEL irr

1 SHIMURA Masato polynomial irr.xirr EXCEL irr 1 SHIMURA Masato 2009 12 8 1 2 1.1................................... 2 1.2 polynomial......................... 4 2 irr.xirr EXCEL 5 2.1 irr............................................. 5 2.2 d f, pv...........................................

More information

『赤すぐ』『妊すぐ』<出産・育児トレンド調査2003>

『赤すぐ』『妊すぐ』<出産・育児トレンド調査2003> 79.9 1.6 UP 86.6% 7.0 UP 61.3% 12.7UP 18-24 3 66.6 3.0 UP 38.7 0.7 UP 14.8 1.9 UP 13.3 0.3UP 4 1 024 1.23 0.01down Topics 5 79.9 1.6UP 7.0 UP 12.7U 3.5 0.4 UP 3.4 0.4 UP 6 73.1% 5.7 UP 75.0% 71.2% 7 53.9%

More information

1重谷.PDF

1重谷.PDF RSCC RSCC RSCC BMT 1 6 3 3000 3000 200310 1994 19942 VPP500/32PE 19992 VPP700E/128PE 160PE 20043 2 2 PC Linux 2048 CPU Intel Xeon 3.06GHzDual) 12.5 TFLOPS SX-7 32CPU/256GB 282.5 GFLOPS Linux 3 PC 1999

More information

Microsoft PowerPoint MPSoC-KojiInoue-web.pptx

Microsoft PowerPoint MPSoC-KojiInoue-web.pptx Adaptive Execution on 3D Microprocessors Koji Inoue Kyushu University 1 Outline Why 3D? Will 3D always work well? work well? Support Adaptive Execution! Memory Hierarchy Run time Optimization Conclusions

More information

OS Windows Mac OS Windows Mac OS Windows XP Mac OS X OS Windows 95 Mac OS

OS Windows Mac OS Windows Mac OS Windows XP Mac OS X OS Windows 95 Mac OS About use of the Chinese character which is not in a computer Reuse and sharing of data by the large-scale character set and the Macro program HUKUDA Sinobu 1 125000 800 JIS 2 2000 3 1 488 545 2 JIS X

More information

にゃんぱすー

にゃんぱすー ビッグデータ分析技術ワークショップ ~ グラフマイニング研究の最新動向と応用事例 ~ 平成 28 年 2 月 28 日 頂点順序の最適化による 高速なグラフ分析 新井淳也 日本電信電話株式会社 ソフトウェアイノベーションセンタ この発表について 下記論文についての発表です Rabbit Order: Just-in-time Parallel Reordering for Fast Graph Analysis

More information

LCR e ix LC AM m k x m x x > 0 x < 0 F x > 0 x < 0 F = k x (k > 0) k x = x(t)

LCR e ix LC AM m k x m x x > 0 x < 0 F x > 0 x < 0 F = k x (k > 0) k x = x(t) 338 7 7.3 LCR 2.4.3 e ix LC AM 7.3.1 7.3.1.1 m k x m x x > 0 x < 0 F x > 0 x < 0 F = k x k > 0 k 5.3.1.1 x = xt 7.3 339 m 2 x t 2 = k x 2 x t 2 = ω 2 0 x ω0 = k m ω 0 1.4.4.3 2 +α 14.9.3.1 5.3.2.1 2 x

More information

マルチコアPCクラスタ環境におけるBDD法のハイブリッド並列実装

マルチコアPCクラスタ環境におけるBDD法のハイブリッド並列実装 2010 GPGPU 2010 9 29 MPI/Pthread (DDM) DDM CPU CPU CPU CPU FEM GPU FEM CPU Mult - NUMA Multprocessng Cell GPU Accelerator, GPU CPU Heterogeneous computng L3 cache L3 cache CPU CPU + GPU GPU L3 cache 4

More information

09中西

09中西 PC NEC Linux (1) (2) (1) (2) 1 Linux Linux 2002.11.22) LLNL Linux Intel Xeon 2300 ASCIWhite1/7 / HPC (IDC) 2002 800 2005 2004 HPC 80%Linux) Linux ASCI Purple (ASCI 100TFlops Blue Gene/L 1PFlops (2005)

More information

研修コーナー

研修コーナー l l l l l l l l l l l α α β l µ l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l

More information

( ) ( ) 30 ( ) 27 [1] p LIFO(last in first out, ) (push) (pup) 1

( ) ( ) 30 ( ) 27 [1] p LIFO(last in first out, ) (push) (pup) 1 () 2006 2 27 1 10 23 () 30 () 27 [1] p.97252 7 2 2.1 2.1.1 1 LIFO(last in first out, ) (push) (pup) 1 1: 2.1.2 1 List 4-1(p.100) stack[] stack top 1 2 (push) (pop) 1 2 void stack push(double val) val stack

More information

超初心者用

超初心者用 3 1999 10 13 1. 2. hello.c printf( Hello, world! n ); cc hello.c a.out./a.out Hello, world printf( Hello, world! n ); 2 Hello, world printf n printf 3. ( ) int num; num = 100; num 100 100 num int num num

More information

XACCの概要

XACCの概要 2 global void kernel(int a[max], int llimit, int ulimit) {... } : int main(int argc, char *argv[]){ MPI_Int(&argc, &argc); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); dx

More information

[2] 2. [3 5] 3D [6 8] Morishima [9] N n 24 24FPS k k = 1, 2,..., N i i = 1, 2,..., n Algorithm 1 N io user-specified number of inbetween omis

[2] 2. [3 5] 3D [6 8] Morishima [9] N n 24 24FPS k k = 1, 2,..., N i i = 1, 2,..., n Algorithm 1 N io user-specified number of inbetween omis 1,a) 2 2 2 1 2 3 24 Motion Frame Omission for Cartoon-like Effects Abstract: Limited animation is a hand-drawn animation style that holds each drawing for two or three successive frames to make up 24 frames

More information

untitled

untitled OS 2007/4/27 1 Uni-processor system revisited Memory disk controller frame buffer network interface various devices bus 2 1 Uni-processor system today Intel i850 chipset block diagram Source: intel web

More information

倍々精度RgemmのnVidia C2050上への実装と応用

倍々精度RgemmのnVidia C2050上への実装と応用 .. maho@riken.jp http://accc.riken.jp/maho/,,, 2011/2/16 1 - : GPU : SDPA-DD 10 1 - Rgemm : 4 (32 ) nvidia C2050, GPU CPU 150, 24GFlops 25 20 GFLOPS 15 10 QuadAdd Cray, QuadMul Sloppy Kernel QuadAdd Cray,

More information

untitled

untitled Power Wall HPL1 10 B/F EXTREMETECH Supercomputing director bets $2,000 that we won t have exascale computing by 2020 One of the biggest problems standing in our way is power. [] http://www.extremetech.com/computing/155941

More information

GPUコンピューティング講習会パート1

GPUコンピューティング講習会パート1 GPU コンピューティング (CUDA) 講習会 GPU と GPU を用いた計算の概要 丸山直也 スケジュール 13:20-13:50 GPU を用いた計算の概要 担当丸山 13:50-14:30 GPU コンピューティングによる HPC アプリケーションの高速化の事例紹介 担当青木 14:30-14:40 休憩 14:40-17:00 CUDA プログラミングの基礎 担当丸山 TSUBAME の

More information

Microsoft PowerPoint - sales2.ppt

Microsoft PowerPoint - sales2.ppt 最適化とは何? CPU アーキテクチャに沿った形で最適な性能を抽出できるようにする技法 ( 性能向上技法 ) コンパイラによるプログラム最適化 コンパイラメーカの技量 経験量に依存 最適化ツールによるプログラム最適化 KAP (Kuck & Associates, Inc. ) 人によるプログラム最適化 アーキテクチャのボトルネックを知ること 3 使用コンパイラによる性能の違い MFLOPS 90

More information

RIITフォーラム2016-inoue提出用

RIITフォーラム2016-inoue提出用 p ü ü p ü ü } Powe r NW Mem. CPU GPU Base 最大負荷アプリA ペタスケール 最大負荷アプリ A アプリ B ポストペタスケール ( 従来型 ) 最大負荷アプリ A アプリ B ポストペタスケール ( 電力制約適応型 ) } } } p p p p Blue=EP type Red=With Comm. & Sync. Total nodes Procs.

More information

1 1(a) MPR 1(b) MPR MPR MPR MPR MPR 2 1 MPR MPR MPR A MPR B MPR 2 MPR MPR MPR MPR MPR GPS MPR MPR MPR 3. MPR MPR 2 MPR 2 (1) (4) Zai

1 1(a) MPR 1(b) MPR MPR MPR MPR MPR 2 1 MPR MPR MPR A MPR B MPR 2 MPR MPR MPR MPR MPR GPS MPR MPR MPR 3. MPR MPR 2 MPR 2 (1) (4) Zai Popular MPR 1,a) 2,b) 2,c) GPS Most Popular Route( MPR) MPR MPR MPR MPR MPR MPR MPR Popular Popular MPR MPR Popular 1. GPS GPS GPS Google Maps *1 Zaiben [1] Most Popular Route( MPR) MPR MPR MPR 1 525 8577

More information

ir資料4 2.ai

ir資料4 2.ai Outline of business Outline of business Outline of business Outline of business Achievement transition Achievement transition Achievement transition Achievement transition Achievement transition Profit

More information

23 Fig. 2: hwmodulev2 3. Reconfigurable HPC 3.1 hw/sw hw/sw hw/sw FPGA PC FPGA PC FPGA HPC FPGA FPGA hw/sw hw/sw hw- Module FPGA hwmodule hw/sw FPGA h

23 Fig. 2: hwmodulev2 3. Reconfigurable HPC 3.1 hw/sw hw/sw hw/sw FPGA PC FPGA PC FPGA HPC FPGA FPGA hw/sw hw/sw hw- Module FPGA hwmodule hw/sw FPGA h 23 FPGA CUDA Performance Comparison of FPGA Array with CUDA on Poisson Equation (lijiang@sekine-lab.ei.tuat.ac.jp), (kazuki@sekine-lab.ei.tuat.ac.jp), (takahashi@sekine-lab.ei.tuat.ac.jp), (tamukoh@cc.tuat.ac.jp),

More information

42 3 u = (37) MeV/c 2 (3.4) [1] u amu m p m n [1] m H [2] m p = (4) MeV/c 2 = (13) u m n = (4) MeV/c 2 =

42 3 u = (37) MeV/c 2 (3.4) [1] u amu m p m n [1] m H [2] m p = (4) MeV/c 2 = (13) u m n = (4) MeV/c 2 = 3 3.1 3.1.1 kg m s J = kg m 2 s 2 MeV MeV [1] 1MeV=1 6 ev = 1.62 176 462 (63) 1 13 J (3.1) [1] 1MeV/c 2 =1.782 661 731 (7) 1 3 kg (3.2) c =1 MeV (atomic mass unit) 12 C u = 1 12 M(12 C) (3.3) 41 42 3 u

More information

muramatsu_ver1.key

muramatsu_ver1.key 229-ThTES α = e 2 /2ε 0 hc (John D. Barrow 2005) Radiationdominated era Matterdominated era Dark energy era 10 3 10 4 10 5 10 6 10 7 10 8 10 9 10 10 Time (years) Time 2 α = e 2 /2ε 0 hc (John D. Barrow

More information

A Study of Adaptive Array Implimentation for mobile comunication in cellular system GD133

A Study of Adaptive Array Implimentation for mobile comunication in cellular system GD133 A Study of Adaptive Array Implimentation for mobile comunication in cellular system 15 1 31 01GD133 LSI DSP CMA 10km/s i 1 1 2 LS-CMA 5 2.1 CMA... 5 2.1.1... 5 2.1.2... 7 2.1.3... 10 2.2 LS-CMA... 13 2.2.1...

More information

VXPRO R1400® ご提案資料

VXPRO R1400® ご提案資料 Intel Core i7 プロセッサ 920 Preliminary Performance Report ノード性能評価 ノード性能の評価 NAS Parallel Benchmark Class B OpenMP 版での性能評価 実行スレッド数を 4 で固定 ( デュアルソケットでは各プロセッサに 2 スレッド ) 全て 2.66GHz のコアとなるため コアあたりのピーク性能は同じ 評価システム

More information

September 9, 2002 ( ) [1] K. Hukushima and Y. Iba, cond-mat/ [2] H. Takayama and K. Hukushima, cond-mat/020

September 9, 2002 ( ) [1] K. Hukushima and Y. Iba, cond-mat/ [2] H. Takayama and K. Hukushima, cond-mat/020 mailto:hukusima@issp.u-tokyo.ac.jp September 9, 2002 ( ) [1] and Y. Iba, cond-mat/0207123. [2] H. Takayama and, cond-mat/0205276. Typeset by FoilTEX Today s Contents Against Temperature Chaos in Spin Glasses

More information

スパコンに通じる並列プログラミングの基礎

スパコンに通じる並列プログラミングの基礎 2018.09.10 furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 1 / 59 furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 2 / 59 Windows, Mac Unix 0444-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 3 / 59 Part I Unix GUI CUI:

More information

Microsoft PowerPoint - stream.ppt [互換モード]

Microsoft PowerPoint - stream.ppt [互換モード] STREAM 1 Quad Opteron: ccnuma Arch. AMD Quad Opteron 2.3GHz Quad のソケット 4 1 ノード (16コア ) 各ソケットがローカルにメモリを持っている NUMA:Non-Uniform Access ローカルのメモリをアクセスして計算するようなプログラミング, データ配置, 実行時制御 (numactl) が必要 cc: cache-coherent

More information

2

2 1 2 3 4 5 6 ( ) 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 6+ 6-5 2 6-5- 6-5+ 5-5- 5- 22 6+ 6-6+ 6-6- S-P time 10 5 2 23 S-P time 5 2 5 2 ( ) 5 2 24 25 26 1 27 28 29 30 95 31 ( 8 2 ) http://www.kishou.go.jp/know/shindo/kaisetsu.html

More information

Page 1 of 6 B (The World of Mathematics) November 20, 2006 Final Exam 2006 Division: ID#: Name: 1. p, q, r (Let p, q, r are propositions. ) (10pts) (a

Page 1 of 6 B (The World of Mathematics) November 20, 2006 Final Exam 2006 Division: ID#: Name: 1. p, q, r (Let p, q, r are propositions. ) (10pts) (a Page 1 of 6 B (The World of Mathematics) November 0, 006 Final Exam 006 Division: ID#: Name: 1. p, q, r (Let p, q, r are propositions. ) (a) (Decide whether the following holds by completing the truth

More information

スパコンに通じる並列プログラミングの基礎

スパコンに通じる並列プログラミングの基礎 2018.06.04 2018.06.04 1 / 62 2018.06.04 2 / 62 Windows, Mac Unix 0444-J 2018.06.04 3 / 62 Part I Unix GUI CUI: Unix, Windows, Mac OS Part II 2018.06.04 4 / 62 0444-J ( : ) 6 4 ( ) 6 5 * 6 19 SX-ACE * 6

More information

ERATO100913

ERATO100913 ERATO September 13, 2010, DC2 1/25 1. 2 2. 2/25 3/25 3/25 2 3/25 2 3/25 1 1 0.5 0.5 0 0 0.5 1 0 0 0.5 1 4/25 1 1 0.5 0.5 0 0 0.5 1 (0, 0) 0 0 0.5 1 4/25 1 1 0.5 0.5 0 0 0.5 1 (0, 0) ( 1, 0) 0 0 0.5 1 4/25

More information

ICDE2013study.ppt

ICDE2013study.ppt ICDE2013 勉強会 R10: Main Memory Query Processing 担当 : 山室健 1 概要 } このセクションの特徴 } in-memory を前提としたクエリ最適化 (Hash Join の高速化や MV による資源の利活用 ) に関する話題 } 紹介する論文リスト } 1. Efficient Many-Core Query Execution in Main Memory

More information

2017 (413812)

2017 (413812) 2017 (413812) Deep Learning ( NN) 2012 Google ASIC(Application Specific Integrated Circuit: IC) 10 ASIC Deep Learning TPU(Tensor Processing Unit) NN 12 20 30 Abstract Multi-layered neural network(nn) has

More information

A Feasibility Study of Direct-Mapping-Type Parallel Processing Method to Solve Linear Equations in Load Flow Calculations Hiroaki Inayoshi, Non-member

A Feasibility Study of Direct-Mapping-Type Parallel Processing Method to Solve Linear Equations in Load Flow Calculations Hiroaki Inayoshi, Non-member A Feasibility Study of Direct-Mapping-Type Parallel Processing Method to Solve Linear Equations in Load Flow Calculations Hiroaki Inayoshi, Non-member (University of Tsukuba), Yasuharu Ohsawa, Member (Kobe

More information

大規模共有メモリーシステムでのGAMESSの利点

大規模共有メモリーシステムでのGAMESSの利点 Technical white paper GAMESS GAMESS Gordon Group *1 Gaussian Gaussian1 Xeon E7 8 80 2013 4 GAMESS 1 RHF ROHF UHF GVB MCSCF SCF Energy CDFpEP CDFpEP CDFpEP CD-pEP CDFpEP SCF Gradient CDFpEP CDFpEP CDFpEP

More information

連載講座 : 高生産並列言語を使いこなす (4) ゲーム木探索の並列化 田浦健次朗 東京大学大学院情報理工学系研究科, 情報基盤センター 目次 1 準備 問題の定義 αβ 法 16 2 αβ 法の並列化 概要 Young Brothers Wa

連載講座 : 高生産並列言語を使いこなす (4) ゲーム木探索の並列化 田浦健次朗 東京大学大学院情報理工学系研究科, 情報基盤センター 目次 1 準備 問題の定義 αβ 法 16 2 αβ 法の並列化 概要 Young Brothers Wa 連載講座 : 高生産並列言語を使いこなす (4) ゲーム木探索の並列化 田浦健次朗 東京大学大学院情報理工学系研究科, 情報基盤センター 目次 1 準備 16 1.1 問題の定義 16 1.2 αβ 法 16 2 αβ 法の並列化 17 2.1 概要 17 2.2 Young Brothers Wait Concept 17 2.3 段数による逐次化 18 2.4 適応的な待機 18 2. 強制終了

More information

234 50cm

234 50cm 234 50cm () 1 10 2 3 4 1 5 6 2 2 1 7 ( ー ) っ ー っ 8 1 2 10 10 2m 4m 6m 15m 457-2472 585-1154 9 10 2 60 2 100 RC SRC 30 80 500 1 500 500 ) 10 B b A 2 A B 2m 457-2473 585-1154 11 20m a 2m 3 3 1m 75cm 120cm

More information

GPGPU

GPGPU GPGPU 2013 1008 2015 1 23 Abstract In recent years, with the advance of microscope technology, the alive cells have been able to observe. On the other hand, from the standpoint of image processing, the

More information

SQUFOF NTT Shanks SQUFOF SQUFOF Pentium III Pentium 4 SQUFOF 2.03 (Pentium 4 2.0GHz Willamette) N UBASIC 50 / 200 [

SQUFOF NTT Shanks SQUFOF SQUFOF Pentium III Pentium 4 SQUFOF 2.03 (Pentium 4 2.0GHz Willamette) N UBASIC 50 / 200 [ SQUFOF SQUFOF NTT 2003 2 17 16 60 Shanks SQUFOF SQUFOF Pentium III Pentium 4 SQUFOF 2.03 (Pentium 4 2.0GHz Willamette) 60 1 1.1 N 62 16 24 UBASIC 50 / 200 [ 01] 4 large prime 943 2 1 (%) 57 146 146 15

More information

zsj2017 (Toyama) program.pdf

zsj2017 (Toyama) program.pdf 88 th Annual Meeting of the Zoological Society of Japan Abstracts 88 th Annual Meeting of the Zoological Society of Japan Abstracts 88 th Annual Meeting of the Zoological Society of Japan Abstracts 88

More information

88 th Annual Meeting of the Zoological Society of Japan Abstracts 88 th Annual Meeting of the Zoological Society of Japan Abstracts 88 th Annual Meeting of the Zoological Society of Japan Abstracts 88

More information

_170825_<52D5><7269><5B66><4F1A>_<6821><4E86><5F8C><4FEE><6B63>_<518A><5B50><4F53><FF08><5168><9801><FF09>.pdf

_170825_<52D5><7269><5B66><4F1A>_<6821><4E86><5F8C><4FEE><6B63>_<518A><5B50><4F53><FF08><5168><9801><FF09>.pdf 88 th Annual Meeting of the Zoological Society of Japan Abstracts 88 th Annual Meeting of the Zoological Society of Japan Abstracts 88 th Annual Meeting of the Zoological Society of Japan Abstracts 88

More information

B 20 Web

B 20 Web B 20 Web 0753018 21 1 29 1 1 6 2 8 3 UI 10 3.1........................ 10 3.2 Web............ 11 3.3......... 12 4 UI 14 4.1 Web....................... 15 4.2 Web........... 16 4.3 Web....................

More information

テストコスト抑制のための技術課題-DFTとATEの観点から

テストコスト抑制のための技術課題-DFTとATEの観点から 2 -at -talk -talk -drop 3 4 5 6 7 Year of Production 2003 2004 2005 2006 2007 2008 Embedded Cores Standardization of core Standard format Standard format Standard format Extension to Extension to test

More information

CP-PACS CP-PACS CP-PACS : 2048PU+128IOU 614GFLOPS peak 128GByte memory 1058GByte disk 1992 1996 SR2201 : 1996 8 9 CP-PACS Top 500 List ranking No. 1 November 1996 Linpack 368.2Gflops No. 24 Novermber 1999

More information

橡3_2石川.PDF

橡3_2石川.PDF PC RWC 01/10/31 2 1 SCore 1,024 PC SCore III PC 01/10/31 3 SCore SCore Aug. 1995 Feb. 1996 Oct. 1996 1997-1998 Oct. 1999 Oct. 2000 April. 2001 01/10/31 4 2 SCore University of Bonn, Germany University

More information

本文ALL.indd

本文ALL.indd Intel Xeon プロセッサにおける Cache Coherency 時間の性能測定方法河辺峻田口成美古谷英祐 Intel Xeon プロセッサにおける Cache Coherency 時間の性能測定方法 Performance Measurement Method of Cache Coherency Effects on an Intel Xeon Processor System 河辺峻田口成美古谷英祐

More information

スパコンに通じる並列プログラミングの基礎

スパコンに通じる並列プログラミングの基礎 2016.06.06 2016.06.06 1 / 60 2016.06.06 2 / 60 Windows, Mac Unix 0444-J 2016.06.06 3 / 60 Part I Unix GUI CUI: Unix, Windows, Mac OS Part II 0444-J 2016.06.06 4 / 60 ( : ) 6 6 ( ) 6 10 6 16 SX-ACE 6 17

More information

Vol.214-HPC-145 No /7/3 C #pragma acc directive-name [clause [[,] clause] ] new-line structured block Fortran!$acc directive-name [clause [[,] c

Vol.214-HPC-145 No /7/3 C #pragma acc directive-name [clause [[,] clause] ] new-line structured block Fortran!$acc directive-name [clause [[,] c Vol.214-HPC-145 No.45 214/7/3 OpenACC 1 3,1,2 1,2 GPU CUDA OpenCL OpenACC OpenACC High-level OpenACC CPU Intex Xeon Phi K2X GPU Intel Xeon Phi 27% K2X GPU 24% 1. TSUBAME2.5 CPU GPU CUDA OpenCL CPU OpenMP

More information

PowerPoint プレゼンテーション

PowerPoint プレゼンテーション PC クラスタシンポジウム 日立のテクニカルコンピューティングへの取り組み 2010/12/10 株式会社日立製作所中央研究所清水正明 1 目次 1 2 3 日立テクニカルサーバラインナップ 日立サーバラインナップ GPU コンピューティングへの取り組み 4 SC10 日立展示 2 1-1 日立テクニカルサーバ : History & Future Almost 30 Years of Super

More information

C言語によるアルゴリズムとデータ構造

C言語によるアルゴリズムとデータ構造 Algorithms and Data Structures in C 4 algorithm List - /* */ #include List - int main(void) { int a, b, c; int max; /* */ Ÿ 3Ÿ 2Ÿ 3 printf(""); printf(""); printf(""); scanf("%d", &a); scanf("%d",

More information

untitled

untitled - 37 - - 3 - (a) (b) 1) 15-1 1) LIQCAOka 199Oka 1999 ),3) ) -1-39 - 1) a) b) i) 1) 1 FEM Zhang ) 1 1) - 35 - FEM 9 1 3 ii) () 1 Dr=9% Dr=35% Tatsuoka 19Fukushima and Tatsuoka19 5),) Dr=35% Dr=35% Dr=3%1kPa

More information

c a a ca c c% c11 c12

c a a ca c c% c11 c12 c a a ca c c% c11 c12 % s & % c13 c14 cc c16 c15 %s & % c211 c21% c212 c21% c213 c21% c214 c21% c215 c21% c216 c21% c23 & % c24 c25 c311 c311 % c% c % c312 %% a c31 c315 c32 c33 c34 % c35 c36 c411 c N

More information

hotspot の特定と最適化

hotspot の特定と最適化 1 1? 1 1 2 1. hotspot : hotspot hotspot Parallel Amplifier 1? 2. hotspot : (1 ) Parallel Composer 1 Microsoft* Ticker Tape Smoke 1.0 PiSolver 66 / 64 / 2.76 ** 84 / 27% ** 75 / 17% ** 1.46 89% Microsoft*

More information