メモリ階層構造を考慮した大規模グラフ処理の高速化
|
|
- めぐの ふしはら
- 4 years ago
- Views:
Transcription
1 , CREST ERATO 0.. (, CREST) ERATO / 8
2 Outline NETAL (NETwork Analysis Library) NUMA BFS raph500, reenraph500 Kronecker raph Level Synchronized parallel BFS Hybrid Algorithm for Parallel BFS NUMA Hybrid Parallel BFS 5th raph500 / (st) reenraph500 (, CREST) ERATO / 8
3 ,,,,,,,,,... raphct : BC (Betweenness, ) USA-road-d.LKS.gr (n =.76M, m = 6.89M) :0.6 cit-patents (n =.77M, m = 6.5M) :.6 (CRAY XMT) NETAL {, } CC, C, SC, BC USA-road-d.LKS.gr (n =.76M, m = 6.89M) :9.4 cit-patents (n =.77M, m = 6.5M) :.5 NUMA Intel/AMD (, CREST) ERATO / 8
4 Brandes algorithm* closeness (CC) C C (v)= t V d (v,t) graph (C) C (v)= max t V d (v,t) stress (SC) C S (v)= s v t V σ st (v) betweenness (BC) σ st (v) C B (v)= s v t V σ st multipathbfs multipathsssp 6 : ShortestPath phase 8-0 : UPdate phase *U.Brandes,AFasterAlgorithmforBetweennessCentrality,(00) : =(V,E) : C C [v], C [v], C S [v], C B [v], v V (0 ) : for s V parallel do : /* σ[v] v */ : /* S */ 4: /* P[v] v */ 5: /* d [v] v */ 6: σ,s,p,d multipathbfs(,s) 7: 8: C C [s] t V d (s,t) 9: C [s] max t V d (s,t) 0: while S /0 do : pop w S : for v P[w] do : δ S [v] ( + δ S [w]) 4: δ B [v] δ B [v]+ σ[v] σ[w] ( + δ B[w]) 5: end for 6: if w s then 7: C S [w] C S [w]+σ[w] δ S [w] 8: C B [w] C B [w]+δ B [w] 9: end if 0: end while : end for (, CREST) ERATO 4 / 8
5 =(V,E), n = V, m = E, l : E R + n SSSP ( : BFS) β n MSSP (SSSP β) n n APSP (,, ) ( : SSSP) distancesssp distance singlepathsssp distance multipathsssp distance singlepathsssp (, CREST) ERATO 5 / 8
6 -HEAP: Dijkstra s algorithm,.., -HEAP. 9th DIMACS MLB,. singlepathsssp, n =.95M m = 58.M -way Xeon X5460.6Hz (4 cores ) SSSP [ms] CPU time (speedup) [B] -HEAP* (sequential) (.00) HEAP* ( threads) 74.4 (.94).7 -HEAP* ( 4 threads) 45. (.65).00 -HEAP* ( 8 threads) 0.7 ( 5.4).46 MLB (sequential) *,,, : (, CREST) ERATO 6 / 8
7 , bottleneck diff-procs diff-l same-l processor bandwidth down down down processor inside bandwidth - down down L cache sharing - - down Arithmetic performance different processors same processor, different L caches same processor, same L cache diff-procs : diff-l : L same-l : L -way Xeon X5460 SSSP (USA-road-d.USA.gr) sequential diff-procs diff-l same-l -HEAP* 5.4 s (± 0.00%) 5.44 s (-.9%) 5.6 s (- 5.05%) 6.6 s (-8.94%) d-heap 7. s (± 0.00%) 7.6 s (- 0.4%) 7.59 s (- 4.74%) 8.79 s (-7.75%) Fib-heap 5.95 s (± 0.00%) 6.09 s (- 0.87%) 6.56 s (-.68%) 8.7 s (-.%) Dial s 4.8 s (± 0.00%) 4.54 s (-.5%) 5.0 s (-.58%) 6.8 s (-.5%) double buckets 4.65 s (± 0.00%) 4.88 s (- 4.7%) 5.5 s (-.4%) 6.64 s (-9.97%) MLB 5.69 s (± 0.00%) 5.85 s (-.74%) 6.7 s (- 7.78%) 7.7 s (-6.9%) -stepping.74 s (± 0.00%).06 s (-.66%).55 s (- 6.4%) 6.49 s (-8.76%) *,,, : (, CREST) ERATO 7 / 8
8 NETAL (NETwork Analysis Library) APSP (CC,C,SC,BC) NUMA (CPU ) APSP (APSP) n-bfs BFS multipathbfs n n-dijkstra Dijkstra s with binary-heap multipathsssp n n/β-mlsc MLSC with binary-heap distancemssp n/β -HEAP n-bfs BFS CC,C,SC,BC n-dijkstra Dijkstra s with binary-heap CC,C,SC,BC 4 Centrality CC, C, SC, BC multipath (unweighted) weighted CC, C, SC, BC multipath in parallel NETAL (NETwork Analysis Library) APSP distance n-bfs n-dijkstra n/b-mslc Y.Yasui et al.: NETAL:High-Performance Implementation of Network Analysis Library Considering Computer Memory Hierarchy, 0. (, CREST) ERATO 8 / 8
9 NUMA 4-way opteron 674 ( cores 4sockets) APSP CPU ( ).. affinity raph Data CPU/Memory affinity worst: 48 -affinity raph Data CPU/Memory affinity best: 6 8-affinity USA-road-d.NY.gr (n = 64K, m = 74K) :TEPS (speedup) affinity n-bfs n-dijkstra n/β-mslc(β = ) sequential 0.5 M (.0) 0.8 M (.0).4 M (.0) threads worst 9.4 M (.7) 0. M (.) 6.6 M ( 9.) best 99. M( 4.6) 4. M(.4) 40.9 M(.7) 4 threads worst 87.8 M ( 8.9).0 M ( 9.7) M ( 6.8) best M( 6.8) 49.8 M(.) 64.0 M(.5) 48 threads worst M ( 7.5) 5.0 M (.6) M (.) best M( 46.) 47.7 M( 4.6) 47.5 M( 5.7) (, CREST) ERATO 9 / 8
10 4-way opteron 674 ( cores 4sockets) afffnity affinity CPU/Memory affinity raph Data worst: 48 -affinity raph Data CPU/Memory affinity best: 6 8-affinity CPU time [seconds] trials n-bfs (best) n-sssp (best) n/β-mssp (best) n-bfs n-sssp n/β-mssp 0 affinity (best) n-bfs (worst) n-sssp (worst) n/β-mssp (worst) affinity (worst) (, CREST) ERATO 0 / 8
11 (APSP) USA-road-d.USA.gr n =.95M, m = 58.M.5 9. distanceapsp, n/β-mslc 7.75 (MLB 9, -stepping 4 ) n-dijkstra (multipathsssp) MLB, -stepping 8 4 LiveJournal soc-livejournal n = 4.85M, m = 68.99M distanceapsp, n/β-mslc.78 (MLB 4, -stepping 04 ) n-dijkstra (multipathsssp) MLB, -stepping 0 USA-road-d.USA.gr soc-livejournal n-bfs 70 days 7.5 days n-dijkstra 99 days 9.6 days n/β-mslc 7.75 days (β = 6).78 days (β = ) LS-BFS 557 days (=.5 years) 79.5 days MLB 774 days (= 4.9 years) 0.55 days -stepping 5 days (= 9. years) 88. days (, CREST) ERATO / 8
12 中心性指標 (USA-road-d.LKS.gr (n =.76M, m = 6.89M)) NETAL は 4 種類の中心性指標 CC,C,CS,CB を計算する 重みなし中心性 (上段, n-bfs で 9.4 時間), 重み付中心性 (下段, n-dijkstra で.8 時間) raphct は枝長を考慮しない BC のみに 0.6 日間要する (4-way Opteron 674) closeness CC (v) = graph C (v) = maxt V d (v,t) t V d (v,t) stress CS (v) = 安井 (中央大学, CREST) s!v!t V σst (v) betweenness CB (v) = メモリ階層構造を考慮した大規模グラフ処理の高速化 s!v!t V σst (v) σst ERATO / 8
13 n-bfs n-dijkstra,, / C C,C,C S,C B raphct raphct, C B NETAL(n-BFS) 4, raphct 6 NETAL(n-SSSP), NETAL(n-BFS).. instance n m n-bfs n-dijkstra raphct (C C,C,C S,C B ) (weighted C C,C,C S,C B ) (C B ) USA-road-d.LKS.gr.8M 6.9M 9.5 h (SP 55 %, UP 45 %).84 h (SP 69 %, UP %) 49.8 h cit-patents.8m 6.5M.87 h (SP 7 %, UP 7 %).5 h (SP 40 %, UP 60 %).6 h (SP) (UP) SSCA# SSCA# C B ( ) SSCA#, n-bfs.8, n-dijkstra.4 raphct instance n m n-bfs n-dijkstra raphct SSCA# (C C,C,C S,C B ) (weighted C C,C,C S,C B ) (C B ) (C B ) R-MAT n = 6.78M m= 4.M 6.0 seconds 60.5 seconds 60.0 seconds error 0.8 MTEPS.9 MTEPS 48.5 MTEPS (, CREST) ERATO / 8
14 raph500, reenraph500 raph500 BFS TEPS ratio Traversed edges per second Kronecker raph 64 BFS. SCALE, edgefactor(= 6), n = SCALE, m = edgefactor n. BFS, BFS, TEPS., 64 Medial TEPS,. reenraph500 BFS TEPS/kW, BFS energy loop. RemotePDU: Omron RC008 raph& enera)on raph& Construc)on BFS Valida)on (, CREST) ERATO 4 / 8
15 Kronecker raph Kronecker raph SCALE kronecker. SCALE =... } {{ } SCALE raph500, = ( ) , Kronecker raph SCALE 6, edgefactor 6 n = 6 = 67.M m = = 47.5M number of nodes node degree (, CREST) ERATO 5 / 8
16 Level Synchronized parallel BFS Level Synchronized parallel BFS BFS., atomic.. (, CREST) ERATO 6 / 8
17 Hybrid Algorithm for Parallel BFS [Beamer,0] Direction Optimizing Breadth-First Search (frontier),. forward-search (top-down step) backward-search (bottom-up step) (, CREST) ERATO 7 / 8
18 Hybrid Algorithm Top-down Bottom-up, Top-down Bottom-up,, Bottom-up Top-down,, m f m u n f n (, CREST) ERATO 8 / 8
19 NUMA Hybrid Algorithm Intel(R) Xeon(R) CPU (HT ) (α = 0,β = 4) TEPS (SCALE=6) SCALE TEPS (#threads=) Traversed Edges Per Second (TEPS).0e+0.5e+0.0e+0.5e+0.0e+0 5.0e+09 edgefactor= 8 edgefactor=6 edgefactor= edgefactor=64 Traversed Edges Per Second (TEPS).0e+0.5e+0.0e+0.5e+0.0e+0 5.0e+09 edgefactor= 8 edgefactor=6 edgefactor= edgefactor=64 0.0e #threads 0.0e scale (, CREST) ERATO 4 / 8
20 raph500(/(reenraph500 SCALE n m / ( BFS( ( ( (TEPS) ( ) (W) TEPS/kW WestmreEX 80 ( Xeon(E7@(4870(@(.40Hz((0(cores)(x(4 SandyBridgeEP ( Xeon(E5@690(@(.90Hz((6(cores)(x( MagnyCours((48)( Opteron(674(@(.0Hz(((cores)(x(4 SandyBridgeEP ( Xeon(E5@690(@(.00Hz((6(cores)(x( Xeon(E5@60(@(.0Hz(((threads)(x(( WestmereEP((4)( Xeon(X5670(@(.9Hz(((cores)(x(( Core(i7@80QM(@(.70Hz(((cores)( Core(i7@80QM(@(.70Hz(((cores)( ( ( ( ( ( ( ( ( ( ( ( ( ( ( (, CREST) ERATO 5 / 8
21 raph500(/(reenraph500 CPU( (node( SCALE n m / ( BFS( ( ( (TEPS) ( ) (W) TEPS/kW WestmreEX 80 ( Xeon(E7@(4870(@(.40Hz((0(cores)(x(4 SandyBridgeEP ( Xeon(E5@690(@(.90Hz((6(cores)(x( MagnyCours((48)( Opteron(674(@(.0Hz(((cores)(x(4 SandyBridgeEP ( Xeon(E5@690(@(.00Hz((6(cores)(x( Xeon(E5@60(@(.0Hz(((threads)(x(( WestmereEP((4)( Xeon(X5670(@(.9Hz(((cores)(x(( Core(i7@80QM(@(.70Hz(((cores)( Core(i7@80QM(@(.70Hz(((cores)( ( ( ( ( ( ( ( ( ( ( ( ( ( ( (, CREST) ERATO 6 / 8
22 st raphreen500-list raphcrest- - -ISC- - raph :-raphcrest8tegra-/-scale0(/(0.99(teps( ASUS-Pad-TF700T-(NVIDIA-Tegra--.7Hz,-4-cores)-/--node- (, CREST) ERATO 7 / 8
23 NETAL (NETwork Analysis Library) NUMA Intel/AMD.5 9., 7.75 CC, C, SC, BC : 9.4 ( : 0.6 ) CC, C, SC, BC :.5 ( :.6 ) raph500 Kronecker raph BFS. (small-world scale-free ) CPU node HT 80, 0.90 TEPS reenraph500, 5 (, CREST) ERATO 8 / 8
untitled
c NUMA 1. 18 (Moore s law) 1Hz CPU 2. 1 (Register) (RAM) Level 1 (L1) L2 L3 L4 TLB (translation look-aside buffer) (OS) TLB TLB 3. NUMA NUMA (Non-uniform memory access) 819 0395 744 1 2014 10 Copyright
More information人工知能学会研究会資料 SIG-FPAI-B Predicting stock returns based on the time lag in information diffusion through supply chain networks 1 1 Yukinobu HA
人工知能学会研究会資料 SIG-FPAI-B508-08 - - Predicting stock returns based on the time lag in information diffusion through supply chain networks 1 1 Yukinobu HAMURO 1 Katsuhiko OKADA 1 1 1 Kwansei Gakuin University
More informationsoturon.dvi
12 Exploration Method of Various Routes with Genetic Algorithm 1010369 2001 2 5 ( Genetic Algorithm: GA ) GA 2 3 Dijkstra Dijkstra i Abstract Exploration Method of Various Routes with Genetic Algorithm
More informationHPC (pay-as-you-go) HPC Web 2
,, 1 HPC (pay-as-you-go) HPC Web 2 HPC Amazon EC2 OpenFOAM GPU EC2 3 HPC MPI MPI Courant 1 GPGPU MPI 4 AMAZON EC2 GPU CLUSTER COMPUTE INSTANCE EC2 GPU (cg1.4xlarge) ( N. Virgina ) Quadcore Intel Xeon 5570
More informationi
24 19 19115096 i 1 1 2 2 2.1..................................... 2 2.2....................... 3 2.3................................... 3 2.3.1.................. 4 2.4............................... 4
More informationGPU n Graphics Processing Unit CG CAD
GPU 2016/06/27 第 20 回 GPU コンピューティング講習会 ( 東京工業大学 ) 1 GPU n Graphics Processing Unit CG CAD www.nvidia.co.jp www.autodesk.co.jp www.pixar.com GPU n GPU ü n NVIDIA CUDA ü NVIDIA GPU ü OS Linux, Windows, Mac
More informationuntitled
A = QΛQ T A n n Λ Q A = XΛX 1 A n n Λ X GPGPU A 3 T Q T AQ = T (Q: ) T u i = λ i u i T {λ i } {u i } QR MR 3 v i = Q u i A {v i } A n = 9000 Quad Core Xeon 2 LAPACK (4/3) n 3 O(n 2 ) O(n 3 ) A {v i }
More information1 911 9001030 9:00 A B C D E F G H I J K L M 1A0900 1B0900 1C0900 1D0900 1E0900 1F0900 1G0900 1H0900 1I0900 1J0900 1K0900 1L0900 1M0900 9:15 1A0915 1B0915 1C0915 1D0915 1E0915 1F0915 1G0915 1H0915 1I0915
More informationCPU Levels in the memory hierarchy Level 1 Level 2... Increasing distance from the CPU in access time Level n Size of the memory at each level 1: 2.2
FFT 1 Fourier fast Fourier transform FFT FFT FFT 1 FFT FFT 2 Fourier 2.1 Fourier FFT Fourier discrete Fourier transform DFT DFT n 1 y k = j=0 x j ω jk n, 0 k n 1 (1) x j y k ω n = e 2πi/n i = 1 (1) n DFT
More information( 9 1 ) 1 2 1.1................................... 2 1.2................................................. 3 1.3............................................... 4 1.4...........................................
More informationuntitled
A = QΛQ T A n n Λ Q A = XΛX 1 A n n Λ X GPGPU A 3 T Q T AQ = T (Q: ) T u i = λ i u i T {λ i } {u i } QR MR 3 v i = Q u i A {v i } A n = 9000 Quad Core Xeon 2 LAPACK (4/3) n 3 O(n 2 ) O(n 3 ) A {v i }
More information3 4 3 2 4 1 4 2 4 2 1 3 1 1 4 1 1 16,000 14,000 12,000 W) S) RC) CB 10,000 8,000 6,000 4,000 2,000 0 12,000 11,500 11,000 10,500 10,000 9,500 9,000 550 540 530 520 510 500 490 480 470 460 450 2008 2009
More informationuntitled
- - GRIPS 1 traceroute IP Autonomous System Level http://opte.org/ GRIPS 2 Network Science http://opte.org http://research.lumeta.com/ches/map http://www.caida.org/home http://www.imdb.com http://citeseer.ist.psu.edu
More information~~~~~~~~~~~~~~~~~~ wait Call CPU time 1, latch: library cache 7, latch: library cache lock 4, job scheduler co
072 DB Magazine 2007 September ~~~~~~~~~~~~~~~~~~ wait Call CPU time 1,055 34.7 latch: library cache 7,278 750 103 24.7 latch: library cache lock 4,194 465 111 15.3 job scheduler coordinator slave wait
More informationFabHetero FabHetero FabHetero FabCache FabCache SPEC2000INT IPC FabCache 0.076%
2013 (409812) FabHetero FabHetero FabHetero FabCache FabCache SPEC2000INT 6 1000 IPC FabCache 0.076% Abstract Single-ISA heterogeneous multi-core processors are increasing importance in the processor architecture.
More informationSlides: TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments
計算機アーキテクチャ第 11 回 マルチプロセッサ 本資料は授業用です 無断で転載することを禁じます 名古屋大学 大学院情報科学研究科 准教授加藤真平 デスクトップ ジョブレベル並列性 スーパーコンピュータ 並列処理プログラム プログラムの並列化 for (i = 0; i < N; i++) { x[i] = a[i] + b[i]; } プログラムの並列化 x[0] = a[0] + b[0];
More information2 HI LO ZDD 2 ZDD 2 HI LO 2 ( ) HI (Zero-suppress ) Zero-suppress ZDD ZDD Zero-suppress 1 ZDD abc a HI b c b Zero-suppress b ZDD ZDD 5) ZDD F 1 F = a
ZDD 1, 2 1, 2 1, 2 2 2, 1 #P- Knuth ZDD (Zero-suppressed Binary Decision Diagram) 2 ZDD ZDD ZDD Knuth Knuth ZDD ZDD Path Enumeration Algorithms Using ZDD and Their Performance Evaluations Toshiki Saitoh,
More informationI I / 47
1 2013.07.18 1 I 2013 3 I 2013.07.18 1 / 47 A Flat MPI B 1 2 C: 2 I 2013.07.18 2 / 47 I 2013.07.18 3 / 47 #PJM -L "rscgrp=small" π-computer small: 12 large: 84 school: 24 84 16 = 1344 small school small
More information1. 2. (Rowthorn, 2014) / 39 1
,, 43 ( ) 2015 7 18 ( ) E-mail: sasaki@econ.kyoto-u.ac.jp 1 / 39 1. 2. (Rowthorn, 2014) 3. 4. 5. 6. 7. 2 / 39 1 ( 1). ( 2). = +. 1. g. r. r > g ( 3).. 3 / 39 2 50% Figure I.1. Income inequality in the
More information4.1 % 7.5 %
2018 (412837) 4.1 % 7.5 % Abstract Recently, various methods for improving computial performance have been proposed. One of these various methods is Multi-core. Multi-core can execute processes in parallel
More informationMicrosoft PowerPoint - GPU_computing_2013_01.pptx
GPU コンピューティン No.1 導入 東京工業大学 学術国際情報センター 青木尊之 1 GPU とは 2 GPGPU (General-purpose computing on graphics processing units) GPU を画像処理以外の一般的計算に使う GPU の魅力 高性能 : ハイエンド GPU はピーク 4 TFLOPS 超 手軽さ : 普通の PC にも装着できる 低価格
More informationGPU CUDA CUDA 2010/06/28 1
GPU CUDA CUDA 2010/06/28 1 GPU NVIDIA Mark Harris, Optimizing Parallel Reduction in CUDA http://developer.download.nvidia.com/ compute/cuda/1_1/website/data- Parallel_Algorithms.html#reduction CUDA SDK
More informationy = x 4 y = x 8 3 y = x 4 y = x 3. 4 f(x) = x y = f(x) 4 x =,, 3, 4, 5 5 f(x) f() = f() = 3 f(3) = 3 4 f(4) = 4 *3 S S = f() + f() + f(3) + f(4) () *4
Simpson H4 BioS. Simpson 3 3 0 x. β α (β α)3 (x α)(x β)dx = () * * x * * ɛ δ y = x 4 y = x 8 3 y = x 4 y = x 3. 4 f(x) = x y = f(x) 4 x =,, 3, 4, 5 5 f(x) f() = f() = 3 f(3) = 3 4 f(4) = 4 *3 S S = f()
More informationuntitled
Ver. 1.0 1...1 1.1 Feature Pack...1 1.2...2 2...8 2.1...8 2.2...9 3...11 3.1... 11 3.2... 12 3.3... 14 3.4... 15 4 Appendix...16 LoadSimulator 2003... 16... 20 Windows Storage Server 2003 Feature Featurepack.Doc
More informationit-ken_open.key
深層学習技術の進展 ImageNet Classification 画像認識 音声認識 自然言語処理 機械翻訳 深層学習技術は これらの分野において 特に圧倒的な強みを見せている Figure (Left) Eight ILSVRC-2010 test Deep images and the cited4: from: ``ImageNet Classification with Networks et
More information第29回日中石炭関係総合会議
1 2 3 4 5 6 闞 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 闞 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69
More information01_OpenMP_osx.indd
OpenMP* / 1 1... 2 2... 3 3... 5 4... 7 5... 9 5.1... 9 5.2 OpenMP* API... 13 6... 17 7... 19 / 4 1 2 C/C++ OpenMP* 3 Fortran OpenMP* 4 PC 1 1 9.0 Linux* Windows* Xeon Itanium OS 1 2 2 WEB OS OS OS 1 OS
More information2005 2006.2.22-1 - 1 Fig. 1 2005 2006.2.22-2 - Element-Free Galerkin Method (EFGM) Meshless Local Petrov-Galerkin Method (MLPGM) 2005 2006.2.22-3 - 2 MLS u h (x) 1 p T (x) = [1, x, y]. (1) φ(x) 0.5 φ(x)
More informationフカシギおねえさん問題の高速計算アルゴリズム
JST ERATO 2013/7/26 Joint work with 1 / 37 1 2 3 4 5 6 2 / 37 1 2 3 4 5 6 3 / 37 : 4 / 37 9 9 6 10 10 25 5 / 37 9 9 6 10 10 25 Bousquet-Mélou (2005) 19 19 3 1GHz Alpha 8 Iwashita (Sep 2012) 21 21 3 2.67GHz
More information16.16%
2017 (411824) 16.16% Abstract Multi-core processor is common technique for high computing performance. In many multi-core processor architectures, all processors share L2 and last level cache memory. Thus,
More information1 SHIMURA Masato polynomial irr.xirr EXCEL irr
1 SHIMURA Masato 2009 12 8 1 2 1.1................................... 2 1.2 polynomial......................... 4 2 irr.xirr EXCEL 5 2.1 irr............................................. 5 2.2 d f, pv...........................................
More information『赤すぐ』『妊すぐ』<出産・育児トレンド調査2003>
79.9 1.6 UP 86.6% 7.0 UP 61.3% 12.7UP 18-24 3 66.6 3.0 UP 38.7 0.7 UP 14.8 1.9 UP 13.3 0.3UP 4 1 024 1.23 0.01down Topics 5 79.9 1.6UP 7.0 UP 12.7U 3.5 0.4 UP 3.4 0.4 UP 6 73.1% 5.7 UP 75.0% 71.2% 7 53.9%
More information1重谷.PDF
RSCC RSCC RSCC BMT 1 6 3 3000 3000 200310 1994 19942 VPP500/32PE 19992 VPP700E/128PE 160PE 20043 2 2 PC Linux 2048 CPU Intel Xeon 3.06GHzDual) 12.5 TFLOPS SX-7 32CPU/256GB 282.5 GFLOPS Linux 3 PC 1999
More informationMicrosoft PowerPoint MPSoC-KojiInoue-web.pptx
Adaptive Execution on 3D Microprocessors Koji Inoue Kyushu University 1 Outline Why 3D? Will 3D always work well? work well? Support Adaptive Execution! Memory Hierarchy Run time Optimization Conclusions
More informationOS Windows Mac OS Windows Mac OS Windows XP Mac OS X OS Windows 95 Mac OS
About use of the Chinese character which is not in a computer Reuse and sharing of data by the large-scale character set and the Macro program HUKUDA Sinobu 1 125000 800 JIS 2 2000 3 1 488 545 2 JIS X
More informationにゃんぱすー
ビッグデータ分析技術ワークショップ ~ グラフマイニング研究の最新動向と応用事例 ~ 平成 28 年 2 月 28 日 頂点順序の最適化による 高速なグラフ分析 新井淳也 日本電信電話株式会社 ソフトウェアイノベーションセンタ この発表について 下記論文についての発表です Rabbit Order: Just-in-time Parallel Reordering for Fast Graph Analysis
More informationLCR e ix LC AM m k x m x x > 0 x < 0 F x > 0 x < 0 F = k x (k > 0) k x = x(t)
338 7 7.3 LCR 2.4.3 e ix LC AM 7.3.1 7.3.1.1 m k x m x x > 0 x < 0 F x > 0 x < 0 F = k x k > 0 k 5.3.1.1 x = xt 7.3 339 m 2 x t 2 = k x 2 x t 2 = ω 2 0 x ω0 = k m ω 0 1.4.4.3 2 +α 14.9.3.1 5.3.2.1 2 x
More informationマルチコアPCクラスタ環境におけるBDD法のハイブリッド並列実装
2010 GPGPU 2010 9 29 MPI/Pthread (DDM) DDM CPU CPU CPU CPU FEM GPU FEM CPU Mult - NUMA Multprocessng Cell GPU Accelerator, GPU CPU Heterogeneous computng L3 cache L3 cache CPU CPU + GPU GPU L3 cache 4
More information09中西
PC NEC Linux (1) (2) (1) (2) 1 Linux Linux 2002.11.22) LLNL Linux Intel Xeon 2300 ASCIWhite1/7 / HPC (IDC) 2002 800 2005 2004 HPC 80%Linux) Linux ASCI Purple (ASCI 100TFlops Blue Gene/L 1PFlops (2005)
More information研修コーナー
l l l l l l l l l l l α α β l µ l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l
More information( ) ( ) 30 ( ) 27 [1] p LIFO(last in first out, ) (push) (pup) 1
() 2006 2 27 1 10 23 () 30 () 27 [1] p.97252 7 2 2.1 2.1.1 1 LIFO(last in first out, ) (push) (pup) 1 1: 2.1.2 1 List 4-1(p.100) stack[] stack top 1 2 (push) (pop) 1 2 void stack push(double val) val stack
More information超初心者用
3 1999 10 13 1. 2. hello.c printf( Hello, world! n ); cc hello.c a.out./a.out Hello, world printf( Hello, world! n ); 2 Hello, world printf n printf 3. ( ) int num; num = 100; num 100 100 num int num num
More informationXACCの概要
2 global void kernel(int a[max], int llimit, int ulimit) {... } : int main(int argc, char *argv[]){ MPI_Int(&argc, &argc); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); dx
More information[2] 2. [3 5] 3D [6 8] Morishima [9] N n 24 24FPS k k = 1, 2,..., N i i = 1, 2,..., n Algorithm 1 N io user-specified number of inbetween omis
1,a) 2 2 2 1 2 3 24 Motion Frame Omission for Cartoon-like Effects Abstract: Limited animation is a hand-drawn animation style that holds each drawing for two or three successive frames to make up 24 frames
More informationuntitled
OS 2007/4/27 1 Uni-processor system revisited Memory disk controller frame buffer network interface various devices bus 2 1 Uni-processor system today Intel i850 chipset block diagram Source: intel web
More information倍々精度RgemmのnVidia C2050上への実装と応用
.. maho@riken.jp http://accc.riken.jp/maho/,,, 2011/2/16 1 - : GPU : SDPA-DD 10 1 - Rgemm : 4 (32 ) nvidia C2050, GPU CPU 150, 24GFlops 25 20 GFLOPS 15 10 QuadAdd Cray, QuadMul Sloppy Kernel QuadAdd Cray,
More informationuntitled
Power Wall HPL1 10 B/F EXTREMETECH Supercomputing director bets $2,000 that we won t have exascale computing by 2020 One of the biggest problems standing in our way is power. [] http://www.extremetech.com/computing/155941
More informationGPUコンピューティング講習会パート1
GPU コンピューティング (CUDA) 講習会 GPU と GPU を用いた計算の概要 丸山直也 スケジュール 13:20-13:50 GPU を用いた計算の概要 担当丸山 13:50-14:30 GPU コンピューティングによる HPC アプリケーションの高速化の事例紹介 担当青木 14:30-14:40 休憩 14:40-17:00 CUDA プログラミングの基礎 担当丸山 TSUBAME の
More informationMicrosoft PowerPoint - sales2.ppt
最適化とは何? CPU アーキテクチャに沿った形で最適な性能を抽出できるようにする技法 ( 性能向上技法 ) コンパイラによるプログラム最適化 コンパイラメーカの技量 経験量に依存 最適化ツールによるプログラム最適化 KAP (Kuck & Associates, Inc. ) 人によるプログラム最適化 アーキテクチャのボトルネックを知ること 3 使用コンパイラによる性能の違い MFLOPS 90
More informationRIITフォーラム2016-inoue提出用
p ü ü p ü ü } Powe r NW Mem. CPU GPU Base 最大負荷アプリA ペタスケール 最大負荷アプリ A アプリ B ポストペタスケール ( 従来型 ) 最大負荷アプリ A アプリ B ポストペタスケール ( 電力制約適応型 ) } } } p p p p Blue=EP type Red=With Comm. & Sync. Total nodes Procs.
More information1 1(a) MPR 1(b) MPR MPR MPR MPR MPR 2 1 MPR MPR MPR A MPR B MPR 2 MPR MPR MPR MPR MPR GPS MPR MPR MPR 3. MPR MPR 2 MPR 2 (1) (4) Zai
Popular MPR 1,a) 2,b) 2,c) GPS Most Popular Route( MPR) MPR MPR MPR MPR MPR MPR MPR Popular Popular MPR MPR Popular 1. GPS GPS GPS Google Maps *1 Zaiben [1] Most Popular Route( MPR) MPR MPR MPR 1 525 8577
More informationir資料4 2.ai
Outline of business Outline of business Outline of business Outline of business Achievement transition Achievement transition Achievement transition Achievement transition Achievement transition Profit
More information_0212_68<5A66><4EBA><79D1>_<6821><4E86><FF08><30C8><30F3><30DC><306A><3057><FF09>.pdf
More information
23 Fig. 2: hwmodulev2 3. Reconfigurable HPC 3.1 hw/sw hw/sw hw/sw FPGA PC FPGA PC FPGA HPC FPGA FPGA hw/sw hw/sw hw- Module FPGA hwmodule hw/sw FPGA h
23 FPGA CUDA Performance Comparison of FPGA Array with CUDA on Poisson Equation (lijiang@sekine-lab.ei.tuat.ac.jp), (kazuki@sekine-lab.ei.tuat.ac.jp), (takahashi@sekine-lab.ei.tuat.ac.jp), (tamukoh@cc.tuat.ac.jp),
More information42 3 u = (37) MeV/c 2 (3.4) [1] u amu m p m n [1] m H [2] m p = (4) MeV/c 2 = (13) u m n = (4) MeV/c 2 =
3 3.1 3.1.1 kg m s J = kg m 2 s 2 MeV MeV [1] 1MeV=1 6 ev = 1.62 176 462 (63) 1 13 J (3.1) [1] 1MeV/c 2 =1.782 661 731 (7) 1 3 kg (3.2) c =1 MeV (atomic mass unit) 12 C u = 1 12 M(12 C) (3.3) 41 42 3 u
More informationmuramatsu_ver1.key
229-ThTES α = e 2 /2ε 0 hc (John D. Barrow 2005) Radiationdominated era Matterdominated era Dark energy era 10 3 10 4 10 5 10 6 10 7 10 8 10 9 10 10 Time (years) Time 2 α = e 2 /2ε 0 hc (John D. Barrow
More informationA Study of Adaptive Array Implimentation for mobile comunication in cellular system GD133
A Study of Adaptive Array Implimentation for mobile comunication in cellular system 15 1 31 01GD133 LSI DSP CMA 10km/s i 1 1 2 LS-CMA 5 2.1 CMA... 5 2.1.1... 5 2.1.2... 7 2.1.3... 10 2.2 LS-CMA... 13 2.2.1...
More informationVXPRO R1400® ご提案資料
Intel Core i7 プロセッサ 920 Preliminary Performance Report ノード性能評価 ノード性能の評価 NAS Parallel Benchmark Class B OpenMP 版での性能評価 実行スレッド数を 4 で固定 ( デュアルソケットでは各プロセッサに 2 スレッド ) 全て 2.66GHz のコアとなるため コアあたりのピーク性能は同じ 評価システム
More informationSeptember 9, 2002 ( ) [1] K. Hukushima and Y. Iba, cond-mat/ [2] H. Takayama and K. Hukushima, cond-mat/020
mailto:hukusima@issp.u-tokyo.ac.jp September 9, 2002 ( ) [1] and Y. Iba, cond-mat/0207123. [2] H. Takayama and, cond-mat/0205276. Typeset by FoilTEX Today s Contents Against Temperature Chaos in Spin Glasses
More informationスパコンに通じる並列プログラミングの基礎
2018.09.10 furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 1 / 59 furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 2 / 59 Windows, Mac Unix 0444-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 3 / 59 Part I Unix GUI CUI:
More informationMicrosoft PowerPoint - stream.ppt [互換モード]
STREAM 1 Quad Opteron: ccnuma Arch. AMD Quad Opteron 2.3GHz Quad のソケット 4 1 ノード (16コア ) 各ソケットがローカルにメモリを持っている NUMA:Non-Uniform Access ローカルのメモリをアクセスして計算するようなプログラミング, データ配置, 実行時制御 (numactl) が必要 cc: cache-coherent
More information2
1 2 3 4 5 6 ( ) 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 6+ 6-5 2 6-5- 6-5+ 5-5- 5- 22 6+ 6-6+ 6-6- S-P time 10 5 2 23 S-P time 5 2 5 2 ( ) 5 2 24 25 26 1 27 28 29 30 95 31 ( 8 2 ) http://www.kishou.go.jp/know/shindo/kaisetsu.html
More informationPage 1 of 6 B (The World of Mathematics) November 20, 2006 Final Exam 2006 Division: ID#: Name: 1. p, q, r (Let p, q, r are propositions. ) (10pts) (a
Page 1 of 6 B (The World of Mathematics) November 0, 006 Final Exam 006 Division: ID#: Name: 1. p, q, r (Let p, q, r are propositions. ) (a) (Decide whether the following holds by completing the truth
More informationスパコンに通じる並列プログラミングの基礎
2018.06.04 2018.06.04 1 / 62 2018.06.04 2 / 62 Windows, Mac Unix 0444-J 2018.06.04 3 / 62 Part I Unix GUI CUI: Unix, Windows, Mac OS Part II 2018.06.04 4 / 62 0444-J ( : ) 6 4 ( ) 6 5 * 6 19 SX-ACE * 6
More informationERATO100913
ERATO September 13, 2010, DC2 1/25 1. 2 2. 2/25 3/25 3/25 2 3/25 2 3/25 1 1 0.5 0.5 0 0 0.5 1 0 0 0.5 1 4/25 1 1 0.5 0.5 0 0 0.5 1 (0, 0) 0 0 0.5 1 4/25 1 1 0.5 0.5 0 0 0.5 1 (0, 0) ( 1, 0) 0 0 0.5 1 4/25
More informationICDE2013study.ppt
ICDE2013 勉強会 R10: Main Memory Query Processing 担当 : 山室健 1 概要 } このセクションの特徴 } in-memory を前提としたクエリ最適化 (Hash Join の高速化や MV による資源の利活用 ) に関する話題 } 紹介する論文リスト } 1. Efficient Many-Core Query Execution in Main Memory
More information2017 (413812)
2017 (413812) Deep Learning ( NN) 2012 Google ASIC(Application Specific Integrated Circuit: IC) 10 ASIC Deep Learning TPU(Tensor Processing Unit) NN 12 20 30 Abstract Multi-layered neural network(nn) has
More informationA Feasibility Study of Direct-Mapping-Type Parallel Processing Method to Solve Linear Equations in Load Flow Calculations Hiroaki Inayoshi, Non-member
A Feasibility Study of Direct-Mapping-Type Parallel Processing Method to Solve Linear Equations in Load Flow Calculations Hiroaki Inayoshi, Non-member (University of Tsukuba), Yasuharu Ohsawa, Member (Kobe
More information大規模共有メモリーシステムでのGAMESSの利点
Technical white paper GAMESS GAMESS Gordon Group *1 Gaussian Gaussian1 Xeon E7 8 80 2013 4 GAMESS 1 RHF ROHF UHF GVB MCSCF SCF Energy CDFpEP CDFpEP CDFpEP CD-pEP CDFpEP SCF Gradient CDFpEP CDFpEP CDFpEP
More information連載講座 : 高生産並列言語を使いこなす (4) ゲーム木探索の並列化 田浦健次朗 東京大学大学院情報理工学系研究科, 情報基盤センター 目次 1 準備 問題の定義 αβ 法 16 2 αβ 法の並列化 概要 Young Brothers Wa
連載講座 : 高生産並列言語を使いこなす (4) ゲーム木探索の並列化 田浦健次朗 東京大学大学院情報理工学系研究科, 情報基盤センター 目次 1 準備 16 1.1 問題の定義 16 1.2 αβ 法 16 2 αβ 法の並列化 17 2.1 概要 17 2.2 Young Brothers Wait Concept 17 2.3 段数による逐次化 18 2.4 適応的な待機 18 2. 強制終了
More information234 50cm
234 50cm () 1 10 2 3 4 1 5 6 2 2 1 7 ( ー ) っ ー っ 8 1 2 10 10 2m 4m 6m 15m 457-2472 585-1154 9 10 2 60 2 100 RC SRC 30 80 500 1 500 500 ) 10 B b A 2 A B 2m 457-2473 585-1154 11 20m a 2m 3 3 1m 75cm 120cm
More informationGPGPU
GPGPU 2013 1008 2015 1 23 Abstract In recent years, with the advance of microscope technology, the alive cells have been able to observe. On the other hand, from the standpoint of image processing, the
More informationSQUFOF NTT Shanks SQUFOF SQUFOF Pentium III Pentium 4 SQUFOF 2.03 (Pentium 4 2.0GHz Willamette) N UBASIC 50 / 200 [
SQUFOF SQUFOF NTT 2003 2 17 16 60 Shanks SQUFOF SQUFOF Pentium III Pentium 4 SQUFOF 2.03 (Pentium 4 2.0GHz Willamette) 60 1 1.1 N 62 16 24 UBASIC 50 / 200 [ 01] 4 large prime 943 2 1 (%) 57 146 146 15
More informationzsj2017 (Toyama) program.pdf
88 th Annual Meeting of the Zoological Society of Japan Abstracts 88 th Annual Meeting of the Zoological Society of Japan Abstracts 88 th Annual Meeting of the Zoological Society of Japan Abstracts 88
More information88 th Annual Meeting of the Zoological Society of Japan Abstracts 88 th Annual Meeting of the Zoological Society of Japan Abstracts 88 th Annual Meeting of the Zoological Society of Japan Abstracts 88
More information_170825_<52D5><7269><5B66><4F1A>_<6821><4E86><5F8C><4FEE><6B63>_<518A><5B50><4F53><FF08><5168><9801><FF09>.pdf
88 th Annual Meeting of the Zoological Society of Japan Abstracts 88 th Annual Meeting of the Zoological Society of Japan Abstracts 88 th Annual Meeting of the Zoological Society of Japan Abstracts 88
More informationB 20 Web
B 20 Web 0753018 21 1 29 1 1 6 2 8 3 UI 10 3.1........................ 10 3.2 Web............ 11 3.3......... 12 4 UI 14 4.1 Web....................... 15 4.2 Web........... 16 4.3 Web....................
More informationテストコスト抑制のための技術課題-DFTとATEの観点から
2 -at -talk -talk -drop 3 4 5 6 7 Year of Production 2003 2004 2005 2006 2007 2008 Embedded Cores Standardization of core Standard format Standard format Standard format Extension to Extension to test
More informationCP-PACS CP-PACS CP-PACS : 2048PU+128IOU 614GFLOPS peak 128GByte memory 1058GByte disk 1992 1996 SR2201 : 1996 8 9 CP-PACS Top 500 List ranking No. 1 November 1996 Linpack 368.2Gflops No. 24 Novermber 1999
More information橡3_2石川.PDF
PC RWC 01/10/31 2 1 SCore 1,024 PC SCore III PC 01/10/31 3 SCore SCore Aug. 1995 Feb. 1996 Oct. 1996 1997-1998 Oct. 1999 Oct. 2000 April. 2001 01/10/31 4 2 SCore University of Bonn, Germany University
More information本文ALL.indd
Intel Xeon プロセッサにおける Cache Coherency 時間の性能測定方法河辺峻田口成美古谷英祐 Intel Xeon プロセッサにおける Cache Coherency 時間の性能測定方法 Performance Measurement Method of Cache Coherency Effects on an Intel Xeon Processor System 河辺峻田口成美古谷英祐
More informationスパコンに通じる並列プログラミングの基礎
2016.06.06 2016.06.06 1 / 60 2016.06.06 2 / 60 Windows, Mac Unix 0444-J 2016.06.06 3 / 60 Part I Unix GUI CUI: Unix, Windows, Mac OS Part II 0444-J 2016.06.06 4 / 60 ( : ) 6 6 ( ) 6 10 6 16 SX-ACE 6 17
More informationVol.214-HPC-145 No /7/3 C #pragma acc directive-name [clause [[,] clause] ] new-line structured block Fortran!$acc directive-name [clause [[,] c
Vol.214-HPC-145 No.45 214/7/3 OpenACC 1 3,1,2 1,2 GPU CUDA OpenCL OpenACC OpenACC High-level OpenACC CPU Intex Xeon Phi K2X GPU Intel Xeon Phi 27% K2X GPU 24% 1. TSUBAME2.5 CPU GPU CUDA OpenCL CPU OpenMP
More informationPowerPoint プレゼンテーション
PC クラスタシンポジウム 日立のテクニカルコンピューティングへの取り組み 2010/12/10 株式会社日立製作所中央研究所清水正明 1 目次 1 2 3 日立テクニカルサーバラインナップ 日立サーバラインナップ GPU コンピューティングへの取り組み 4 SC10 日立展示 2 1-1 日立テクニカルサーバ : History & Future Almost 30 Years of Super
More informationC言語によるアルゴリズムとデータ構造
Algorithms and Data Structures in C 4 algorithm List - /* */ #include List - int main(void) { int a, b, c; int max; /* */ Ÿ 3Ÿ 2Ÿ 3 printf(""); printf(""); printf(""); scanf("%d", &a); scanf("%d",
More informationuntitled
- 37 - - 3 - (a) (b) 1) 15-1 1) LIQCAOka 199Oka 1999 ),3) ) -1-39 - 1) a) b) i) 1) 1 FEM Zhang ) 1 1) - 35 - FEM 9 1 3 ii) () 1 Dr=9% Dr=35% Tatsuoka 19Fukushima and Tatsuoka19 5),) Dr=35% Dr=35% Dr=3%1kPa
More informationc a a ca c c% c11 c12
c a a ca c c% c11 c12 % s & % c13 c14 cc c16 c15 %s & % c211 c21% c212 c21% c213 c21% c214 c21% c215 c21% c216 c21% c23 & % c24 c25 c311 c311 % c% c % c312 %% a c31 c315 c32 c33 c34 % c35 c36 c411 c N
More informationhotspot の特定と最適化
1 1? 1 1 2 1. hotspot : hotspot hotspot Parallel Amplifier 1? 2. hotspot : (1 ) Parallel Composer 1 Microsoft* Ticker Tape Smoke 1.0 PiSolver 66 / 64 / 2.76 ** 84 / 27% ** 75 / 17% ** 1.46 89% Microsoft*
More information