IPSJ SIG Technical Report Vol.2021-HPC-178 No /3/16 MPI 1,a) Extra-P Extra-P TSUBAME3.0 NPB 256 A C D 19.3% 5% MPI,,, 1. Extra-P [5] Ex

Size: px
Start display at page:

Download "IPSJ SIG Technical Report Vol.2021-HPC-178 No /3/16 MPI 1,a) Extra-P Extra-P TSUBAME3.0 NPB 256 A C D 19.3% 5% MPI,,, 1. Extra-P [5] Ex"

Transcription

1 Vol.221-HPC-178 No.19 MPI 1,a) Extra-P Extra-P TSUBAME3. NPB 256 A C D 19.3% 5% MPI,,, 1. Extra-P [5]Extra-P , Chofugaoka, Chofu, Tokyo , Japan a) arima@hpc.is.uec.ac.jp Extra-P Extra-P Extra-P MPI CPU 1

2 [4] TAU Score-P Extra-P [3][4] 2.2 Extra-P Extra- P[5] 21 [2] Extra-P Extra-P 2.3 IF 異なる条件で複数回 解析対象のアプリケーションを実行してプロファイルを取得する 問題サイズもしくは実行プロセス数を変数としてモデルの構築を行う 構築したモデルの中で 最も適合度の高いモデルを選択する 1: Extra-P Vol.221-HPC-178 No.19 4 x 2

3 y (1) y = ax + b (1) (2) y = a log 1 x + b (2) (3) y = a x + b (3) x x y (4) ax + b (x < x ) y = ax + b (x x ) (4) : TSUBAME3. 54 CPU Memory 12.15PFlops 138,24GB 2: TSUBAME3. Intel Xeon E5-268 V4 14(28) 2.4GHz 256GB 153.6GB/s GPU NVIDIA Tesla P1 4.2 TSUBAME3. TAU TSUBAME3. TSUBAME CPU(Intel Xeon E5-268 V4) 2 TSUBAME TAU (Tuning and Analysis Utilities) TAU C Python [4] NAS Parallel Benchmarks NAS Parallel Benchmarks (NPB) NPB [1]NPB A, B, C, D 4 B A 4 C B 4 D C (MAPE) (F t ) (A t ) 5 Vol.221-HPC-178 No.19 3

4 Vol.221-HPC-178 No.19 IS EP CG MG FT BT SP LU 3: NAS Parallel Benchmarks LU MAP E = 1% N N A t F t A t (5) t=1 [%] = A C D () () PC MAPE 4 5 MAPE NoData 4 64 A, B, C 5 B BT SP 1, 4, 16, 64 1, 2, 4, 8, 16, 32, 64 BT, SP BT SP MAPE. 1 MAPE. MAPE

5 Vol.221-HPC-178 No.19 4: 64 (MAPE [%] MAPE [%]) [%] BT 99(.,.) 1(.,.) (NoData) (NoData) CG 69(.,.) 13(21.2, 3.1) (NoData) 18(.,.) EP 1(.,.) (NoData) (NoData) (NoData) FT 57(.,.) 6(22.5, 22.6) (NoData) 37(.,.) IS 1(.,.) (NoData) (NoData) (NoData) LU 81(.,.) 19(.1,.4) (NoData) (NoData) MG 48(.,.) 4(27., 28.2) 3(1.3, 1.3) 9(.,.) SP 98(.,.) 2(.,.) (NoData) (NoData) 5: B (MAPE [%] MAPE [%]) [%] BT 78(.,.) 22(.,.5) (NoData) (NoData) CG 69(.,.) (NoData) (NoData) 31(., 12.2) EP 1(.,.) (NoData) (NoData) (NoData) FT 62(.,.) (NoData) (NoData) 38(., 88.7) IS 82(.,.) 14(14., 14.) (NoData) 4(88.7, 88.7) LU 77(.,.) 21(., 17.2) 2(.,.) (NoData) MG 72(.,.) (NoData) 14(91., 91.) 14(19.4, 19.4) SP 79(.,.) 21(.,.5) (NoData) (NoData) 2: B : B

6 Vol.221-HPC-178 No.19 6: 4 A, B, C, D 3 A, B, C 2 A, B 1 A 7: BT SP 5 1, 4, 16, 64, , 4, 16, , 4, , : BT SP 9 1, 2, 4, 8, 16, 32, 64, 128, , 2, 4, 8, 16, 32, 64, , 2, 4, 8, 16, , 2, 4, 8 2 1, MAPE B : 64 5: B % B BT SP 4 BT SP 6

7 9: 64 [%] [%] BT CG EP FT IS LU MG SP : B [%] [%] BT CG EP FT IS LU MG SP % 5.83% % 445.5% / Extra-P Extra-P 4.65% % 5% Vol.221-HPC-178 No.19 JSPS JP2H4193 [1] Bailey, D. H.: The NAS Parallel Benchmarks, RNR-94-7 (1994). [2] Calotoiu, A., Hoefler, T., Schulz, M., Shudler, S. and Wolf, F.: Insightful Automatic Performance Modeling, extra-p/slides/insightfulautomaticperformance ModelingTutorialPartI.pdf. [3] Knüpfer, A., Rössel, C., an Mey, D., Biersdorff, S., Diethelm, K., Eschweiler, D., Geimer, M., Gerndt, M., Lorenz, D., Malony, A. D., Nagel, W. E., Oleynik, Y., Philippen, P., Saviankou, P., Schmidl, D., Shende, S., Tschüter, R., Wagner, M., Wesarg, B. and Wolf, F.: Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir (211). [4] Performance Research Lab: TAU, https: // [5] Technical University of Darmstadt: Extra-P, extra-p/download.html. 7

8 MPI 1,a) MPI TSUBAME3. NAS Parallel Benchmark L1 58.7% 11.3% MPI,,, 1. 2 CPU [1] [2], [5] , Chofugaoka, Chofu, Tokyo , Japan a) hasegawa@hpc.is.uec.ac.jp [3], [7] 1

9 2. HPC TAU[5] TAU THROTTLE TAU THROTTLE 1 [6] Extra-P [7]Extra-P MPI OpenMP L1 MPI L1 L1 2 ( 8,32,128 ) L1 ( 256 ) L1 L1 L1 L1 3.2 linear,inverse,log,exponentail 4 y = ax + b (1) y = a + b ( a) x (2) y = log x + b (1 < a) log a (3) y = ab x + c (1 < b, c) (4) x y L1 a, b, c 4 4 MAPE MAPE MAPE MAP E = 1% N A t F t N A t (5) t=1 A t L1 F t L1 2

10 1: TSUBAME3. 54 CPU() Intel Xeon E5-268 V4 Processor(Broadwell-EP, 14, 2.4GHz) 2 RAM() 256GiB (DDR GB 8) Intel DC P35 2TB (NVMe, PCI-E 3. x4, R27/W18) Intel Omni-Path 1Gb/s 4 DDN SFA14KXE EXAScaler 2: TSUBAME3. [KB] L1 32 L1 32 L2 256 L3 35,84 N (6) linear y = ax + b (a > ) (6) x y, a, b (1) L TSUBAME3.[4] TSUBAME3. 1 TSUB- AME 54 2 CPU CPU 14 CPU 2 TSUBAME3. CPU L1 / L2 L3 [8] NAS Parallel Benchmarks (NPB) 6 3 [9] A, B, C, D 4 B C A B 4 D C FT, IS, LU 3 D 8 1 MPI 1 3: cg ep Embarassingly parallel ft 3 is lu mg L1 TAU PAPI[1] L1 L1 PC L1 256 L1 3

11 4: [%] (MAPE [%], MAPE [%]) linear inverse log exponential cg (.67, 9.35) (.57, 14.86) 1.79 (1.75, 1.75) (.23, 1.6) ep. (-,-) 1. (., 3.24). (-, -). (-, -) ft 7.14 (1.51, ) (., 127.4) 3.6 (8.35, 15.84) 2.41 (.31, 25.26) is 7.14 (1.1, 6.92) (.29, 9.71) 1.79 (1.27, 1.27) (.1, 32.82) lu 9.85 (.34, 16.9) (.77, 61.84).76 (3.17, 3.17) (.35, 29.87) mg 2.27 (2.26, 8.11) (.11, ). (-, -) 25. (.81, 25.96) 16,,,,,,,,,, :64, :128, :128, :128, :128, :64: L1 A C 256 D A B 2 A C 3 A D 4 D L L1 8,16,32,64,128,256 MAPE MAPE A,B,C,D 4 1 MAPE 1.8% mg D MAPE 2% mg D comm3 ex MAPE 1,% A,B,C,D 8,16, ,128,256 comm3 ex MAPE A B C D cg ep ft is lu mg benchmark 1: 4 linear,inverse,log,exponential MAPE 4 4 linear ft inverse inverse MAPE 1% 127% 1241% inverse inverse log MAPE exponential linear log MAPE 33% 5.2 (7) 4

12 average_error :64 :128 :128 :128 :128 :64:128 :64:128:256 :128:256 cg ep ft is lu mg average_error :64 :128 :128 :128 :128 :64:128 :64:128:256 :128:256 cg ep ft is lu mg 2: ( A) 4: ( C) average_error :64 :128 :128 :128 :128 :64:128 :64:128:256 :128:256 cg ep ft is lu mg average_error :64 :128 :128 :128 :128 :64:128 :64:128:256 :128:256 cg ep ft is lu mg 3: ( B) average error = 1 N f N f t=1 A t F t A t (7) N f A t F t A 2 B 3 C 4 D 5 8,64, % ABCD lu,ft,is 3 8,16,32 16,32,64 5: ( D) cg A 8,64, relative error relative error = 1 A F A (8) A F 1% relative error relative error 1% 6 2, 3, 4 A BA CA D 5

13 5: cg, A function name relative error[%].tau main MAIN makea sprnvc conj grad initialize mpi.3995 randlc icnvrt vecset sparse alloc space setup submatrix info.27 setup proc info relaitve_cost[%] :64 :128 :128 :128 :128 :64:128 :64:128:256 :128:256 cg_a ep_a ft_a is_a lu_a mg_a 7: A average_error cg ep ft is lu mg profile_number 6: relaitve_cost[%] :64 :128 :128 :128 :128 :64:128 :64:128:256 :128:256 cg_b ep_b ft_b is_b lu_b mg_b 8: B (9) 2 199% % 6 A C 3 1% A,B,C,D A 1,4,16, ,4, ,64, relative cost = 1 C p C E (9) C E C p (relative cost) 7 1 8,64, % cg A,B ep 7 A 1 D 6

14 relaitve_cost[%] :64 :128 :128 :128 :128 :64:128 :64:128:256 :128:256 cg_c ep_c ft_c is_c lu_c mg_c Relative_cost[%] cg ep ft is lu mg profile_number 9: C 11: 256 relaitve_cost[%] :64 :128 :128 :128 :128 :64:128 :64:128:256 :128:256 cg_d ep_d ft_d is_d lu_d mg_d 1: D D D C E C p (9) Relative cost 2 4.% % 2 2% 3 25% MPI L1 TSUBAME3. NPB 1.8% 8,64, % 1.8 A C 3 D 58.7% D 11.3% 6.2 L1 L2 L3 MPI MPI A D 256 E F HPC JSPS JP2H4193 7

15 [1] TOP 5 November 22 (Accessed on 1/28/221) [2] Knüpfer A. et al. (212) Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir. In: Brunst H., Müller M., Nagel W., Resch M. (eds) Tools for High Performance Computing 211. Springer, Berlin, Heidelberg. [3] PMaC Performance Modeling and Charactrization (Accessed on 1/26/221) [4] TSUBAME3. gsic.titech.ac.jp/manuals/handbook.ja/jobs/ (Accessed on 221/2/23) [5] S. Shende and A. D. Malony, The TAU Parallel Performance System, International Journal of High Performance Computing Applications, SAGE Publications, 2(2): , Summer 26 [6] TAU throttle research/tau/docs/tutorial/ch1s5.html (Accessed on 2/7/221) [7] Extra-P (Accessed on 1/8/221) [8] Intel Xeon E5-268 V4 products/91754/intel-xeon-processor-e v4-35m-cache-2-4-ghz.html (Accessed on 1/2/221) [9] NAS Parallel Benchmarks https: // (Accessed on 1/1/221) [1] Terpstra, D., Jagode, H., You, H., Dongarra, J. Collecting Performance Data with PAPI-C, Tools for High Performance Computing 29, Springer Berlin / Heidelberg, 3rd Parallel Tools Workshop, Dresden, Germany, pp , 21. 8

VXPRO R1400® ご提案資料

VXPRO R1400® ご提案資料 Intel Core i7 プロセッサ 920 Preliminary Performance Report ノード性能評価 ノード性能の評価 NAS Parallel Benchmark Class B OpenMP 版での性能評価 実行スレッド数を 4 で固定 ( デュアルソケットでは各プロセッサに 2 スレッド ) 全て 2.66GHz のコアとなるため コアあたりのピーク性能は同じ 評価システム

More information

07-二村幸孝・出口大輔.indd

07-二村幸孝・出口大輔.indd GPU Graphics Processing Units HPC High Performance Computing GPU GPGPU General-Purpose computation on GPU CPU GPU GPU *1 Intel Quad-Core Xeon E5472 3.0 GHz 2 6 MB L2 cache 1600 MHz FSB 80 GFlops 1 nvidia

More information

GPU n Graphics Processing Unit CG CAD

GPU n Graphics Processing Unit CG CAD GPU 2016/06/27 第 20 回 GPU コンピューティング講習会 ( 東京工業大学 ) 1 GPU n Graphics Processing Unit CG CAD www.nvidia.co.jp www.autodesk.co.jp www.pixar.com GPU n GPU ü n NVIDIA CUDA ü NVIDIA GPU ü OS Linux, Windows, Mac

More information

1 GPU GPGPU GPU CPU 2 GPU 2007 NVIDIA GPGPU CUDA[3] GPGPU CUDA GPGPU CUDA GPGPU GPU GPU GPU Graphics Processing Unit LSI LSI CPU ( ) DRAM GPU LSI GPU

1 GPU GPGPU GPU CPU 2 GPU 2007 NVIDIA GPGPU CUDA[3] GPGPU CUDA GPGPU CUDA GPGPU GPU GPU GPU Graphics Processing Unit LSI LSI CPU ( ) DRAM GPU LSI GPU GPGPU (I) GPU GPGPU 1 GPU(Graphics Processing Unit) GPU GPGPU(General-Purpose computing on GPUs) GPU GPGPU GPU ( PC ) PC PC GPU PC PC GPU GPU 2008 TSUBAME NVIDIA GPU(Tesla S1070) TOP500 29 [1] 2009 AMD

More information

,4) 1 P% P%P=2.5 5%!%! (1) = (2) l l Figure 1 A compilation flow of the proposing sampling based architecture simulation

,4) 1 P% P%P=2.5 5%!%! (1) = (2) l l Figure 1 A compilation flow of the proposing sampling based architecture simulation 1 1 1 1 SPEC CPU 2000 EQUAKE 1.6 50 500 A Parallelizing Compiler Cooperative Multicore Architecture Simulator with Changeover Mechanism of Simulation Modes GAKUHO TAGUCHI 1 YOUICHI ABE 1 KEIJI KIMURA 1

More information

untitled

untitled A = QΛQ T A n n Λ Q A = XΛX 1 A n n Λ X GPGPU A 3 T Q T AQ = T (Q: ) T u i = λ i u i T {λ i } {u i } QR MR 3 v i = Q u i A {v i } A n = 9000 Quad Core Xeon 2 LAPACK (4/3) n 3 O(n 2 ) O(n 3 ) A {v i }

More information

HP High Performance Computing(HPC)

HP High Performance Computing(HPC) ACCELERATE HP High Performance Computing HPC HPC HPC HPC HPC 1000 HPHPC HPC HP HPC HPC HPC HP HPCHP HP HPC 1 HPC HP 2 HPC HPC HP ITIDC HP HPC 1HPC HPC No.1 HPC TOP500 2010 11 HP 159 32% HP HPCHP 2010 Q1-Q4

More information

23 Fig. 2: hwmodulev2 3. Reconfigurable HPC 3.1 hw/sw hw/sw hw/sw FPGA PC FPGA PC FPGA HPC FPGA FPGA hw/sw hw/sw hw- Module FPGA hwmodule hw/sw FPGA h

23 Fig. 2: hwmodulev2 3. Reconfigurable HPC 3.1 hw/sw hw/sw hw/sw FPGA PC FPGA PC FPGA HPC FPGA FPGA hw/sw hw/sw hw- Module FPGA hwmodule hw/sw FPGA h 23 FPGA CUDA Performance Comparison of FPGA Array with CUDA on Poisson Equation (lijiang@sekine-lab.ei.tuat.ac.jp), (kazuki@sekine-lab.ei.tuat.ac.jp), (takahashi@sekine-lab.ei.tuat.ac.jp), (tamukoh@cc.tuat.ac.jp),

More information

HPEハイパフォーマンスコンピューティング ソリューション

HPEハイパフォーマンスコンピューティング ソリューション HPE HPC / AI Page 2 No.1 * 24.8% No.1 * HPE HPC / AI HPC AI SGIHPE HPC / AI GPU TOP500 50th edition Nov. 2017 HPE No.1 124 www.top500.org HPE HPC / AI TSUBAME 3.0 2017 7 AI TSUBAME 3.0 HPE SGI 8600 System

More information

マルチコアPCクラスタ環境におけるBDD法のハイブリッド並列実装

マルチコアPCクラスタ環境におけるBDD法のハイブリッド並列実装 2010 GPGPU 2010 9 29 MPI/Pthread (DDM) DDM CPU CPU CPU CPU FEM GPU FEM CPU Mult - NUMA Multprocessng Cell GPU Accelerator, GPU CPU Heterogeneous computng L3 cache L3 cache CPU CPU + GPU GPU L3 cache 4

More information

1 M32R Single-Chip Multiprocessor [2] [3] [4] [5] Linux/M32R UP(Uni-processor) SMP(Symmetric Multi-processor) MMU CPU nommu Linux/M32R Linux/M32R 2. M

1 M32R Single-Chip Multiprocessor [2] [3] [4] [5] Linux/M32R UP(Uni-processor) SMP(Symmetric Multi-processor) MMU CPU nommu Linux/M32R Linux/M32R 2. M M32R Linux SMP a) Implementation of Linux SMP kernel for M32R multiprocessor Hayato FUJIWARA a), Hitoshi YAMAMOTO, Hirokazu TAKATA, Kei SAKAMOTO, Mamoru SAKUGAWA, and Hiroyuki KONDO CPU OS 32 RISC M32R

More information

先進的計算基盤システムシンポジウム SACSIS2012 Symposium on Advanced Computing Systems and Infrastructures SACSIS /5/18 CPU, CPU., Memory-bound CPU,., Memory-bo

先進的計算基盤システムシンポジウム SACSIS2012 Symposium on Advanced Computing Systems and Infrastructures SACSIS /5/18 CPU, CPU., Memory-bound CPU,., Memory-bo CPU, CPU, Memory-bound CPU,, Memory-bound ( ) Performance Monitoring Counter(PMC), PMC (nmi watchdog), PMC CPU., PMC, CPU, Memory-bound, CPU-bound,, CPU,, PMC,,,, CPU, NPB 8, 5% CPU, CPU, 3%, 5% CPU, IS

More information

PC Development of Distributed PC Grid System,,,, Junji Umemoto, Hiroyuki Ebara, Katsumi Onishi, Hiroaki Morikawa, and Bunryu U PC WAN PC PC WAN PC 1 P

PC Development of Distributed PC Grid System,,,, Junji Umemoto, Hiroyuki Ebara, Katsumi Onishi, Hiroaki Morikawa, and Bunryu U PC WAN PC PC WAN PC 1 P PC Development of Distributed PC Grid System,,,, Junji Umemoto, Hiroyuki Ebara, Katsumi Onishi, Hiroaki Morikawa, and Bunryu U PC WAN PC PC WAN PC 1 PC PC PC PC PC Key Words:Grid, PC Cluster, Distributed

More information

untitled

untitled A = QΛQ T A n n Λ Q A = XΛX 1 A n n Λ X GPGPU A 3 T Q T AQ = T (Q: ) T u i = λ i u i T {λ i } {u i } QR MR 3 v i = Q u i A {v i } A n = 9000 Quad Core Xeon 2 LAPACK (4/3) n 3 O(n 2 ) O(n 3 ) A {v i }

More information

CPU Levels in the memory hierarchy Level 1 Level 2... Increasing distance from the CPU in access time Level n Size of the memory at each level 1: 2.2

CPU Levels in the memory hierarchy Level 1 Level 2... Increasing distance from the CPU in access time Level n Size of the memory at each level 1: 2.2 FFT 1 Fourier fast Fourier transform FFT FFT FFT 1 FFT FFT 2 Fourier 2.1 Fourier FFT Fourier discrete Fourier transform DFT DFT n 1 y k = j=0 x j ω jk n, 0 k n 1 (1) x j y k ω n = e 2πi/n i = 1 (1) n DFT

More information

supercomputer2010.ppt

supercomputer2010.ppt nanri@cc.kyushu-u.ac.jp 1 !! : 11 12! : nanri@cc.kyushu-u.ac.jp! : Word 2 ! PC GPU) 1997 7 http://wiredvision.jp/news/200806/2008062322.html 3 !! (Cell, GPU )! 4 ! etc...! 5 !! etc. 6 !! 20km 40 km ) 340km

More information

GPU GPU CPU CPU CPU GPU GPU N N CPU ( ) 1 GPU CPU GPU 2D 3D CPU GPU GPU GPGPU GPGPU 2 nvidia GPU CUDA 3 GPU 3.1 GPU Core 1

GPU GPU CPU CPU CPU GPU GPU N N CPU ( ) 1 GPU CPU GPU 2D 3D CPU GPU GPU GPGPU GPGPU 2 nvidia GPU CUDA 3 GPU 3.1 GPU Core 1 GPU 4 2010 8 28 1 GPU CPU CPU CPU GPU GPU N N CPU ( ) 1 GPU CPU GPU 2D 3D CPU GPU GPU GPGPU GPGPU 2 nvidia GPU CUDA 3 GPU 3.1 GPU Core 1 Register & Shared Memory ( ) CPU CPU(Intel Core i7 965) GPU(Tesla

More information

卒業論文

卒業論文 PC OpenMP SCore PC OpenMP PC PC PC Myrinet PC PC 1 OpenMP 2 1 3 3 PC 8 OpenMP 11 15 15 16 16 18 19 19 19 20 20 21 21 23 26 29 30 31 32 33 4 5 6 7 SCore 9 PC 10 OpenMP 14 16 17 10 17 11 19 12 19 13 20 1421

More information

hpc141_shirahata.pdf

hpc141_shirahata.pdf GPU アクセラレータと不揮発性メモリ を考慮した I/O 性能の予備評価 白幡晃一 1,2 佐藤仁 1,2 松岡聡 1 1: 東京工業大学 2: JST CREST 1 GPU と不揮発性メモリを用いた 大規模データ処理 大規模データ処理 センサーネットワーク 遺伝子情報 SNS など ペタ ヨッタバイト級 高速処理が必要 スーパーコンピュータ上での大規模データ処理 GPU 高性能 高バンド幅 例

More information

fiš„v8.dvi

fiš„v8.dvi (2001) 49 2 333 343 Java Jasp 1 2 3 4 2001 4 13 2001 9 17 Java Jasp (JAva based Statistical Processor) Jasp Jasp. Java. 1. Jasp CPU 1 106 8569 4 6 7; fuji@ism.ac.jp 2 106 8569 4 6 7; nakanoj@ism.ac.jp

More information

1 OpenCL OpenCL 1 OpenCL GPU ( ) 1 OpenCL Compute Units Elements OpenCL OpenCL SPMD (Single-Program, Multiple-Data) SPMD OpenCL work-item work-group N

1 OpenCL OpenCL 1 OpenCL GPU ( ) 1 OpenCL Compute Units Elements OpenCL OpenCL SPMD (Single-Program, Multiple-Data) SPMD OpenCL work-item work-group N GPU 1 1 2 1, 3 2, 3 (Graphics Unit: GPU) GPU GPU GPU Evaluation of GPU Computing Based on An Automatic Program Generation Technology Makoto Sugawara, 1 Katsuto Sato, 1 Kazuhiko Komatsu, 2 Hiroyuki Takizawa

More information

Shonan Institute of Technology MEMOIRS OF SHONAN INSTITUTE OF TECHNOLOGY Vol. 41, No. 1, 2007 Ships1 * ** ** ** Development of a Small-Mid Range Paral

Shonan Institute of Technology MEMOIRS OF SHONAN INSTITUTE OF TECHNOLOGY Vol. 41, No. 1, 2007 Ships1 * ** ** ** Development of a Small-Mid Range Paral MEMOIRS OF SHONAN INSTITUTE OF TECHNOLOGY Vol. 41, No. 1, 2007 Ships1 * ** ** ** Development of a Small-Mid Range Parallel Computer Ships1 Makoto OYA*, Hiroto MATSUBARA**, Kazuyoshi SAKURAI** and Yu KATO**

More information

workshop Eclipse TAU AICS.key

workshop Eclipse TAU AICS.key 11 AICS 2016/02/10 1 Bryzgalov Peter @ HPC Usability Research Team RIKEN AICS Copyright 2016 RIKEN AICS 2 3 OS X, Linux www.eclipse.org/downloads/packages/eclipse-parallel-application-developers/lunasr2

More information

倍々精度RgemmのnVidia C2050上への実装と応用

倍々精度RgemmのnVidia C2050上への実装と応用 .. maho@riken.jp http://accc.riken.jp/maho/,,, 2011/2/16 1 - : GPU : SDPA-DD 10 1 - Rgemm : 4 (32 ) nvidia C2050, GPU CPU 150, 24GFlops 25 20 GFLOPS 15 10 QuadAdd Cray, QuadMul Sloppy Kernel QuadAdd Cray,

More information

iphone GPGPU GPU OpenCL Mac OS X Snow LeopardOpenCL iphone OpenCL OpenCL NVIDIA GPU CUDA GPU GPU GPU 15 GPU GPU CPU GPU iii OpenMP MPI CPU OpenCL CUDA OpenCL CPU OpenCL GPU NVIDIA Fermi GPU Fermi GPU GPU

More information

AMD/ATI Radeon HD 5870 GPU DEGIMA LINPACK HD 5870 GPU DEGIMA LINPACK GFlops/Watt GFlops/Watt Abstract GPU Computing has lately attracted

AMD/ATI Radeon HD 5870 GPU DEGIMA LINPACK HD 5870 GPU DEGIMA LINPACK GFlops/Watt GFlops/Watt Abstract GPU Computing has lately attracted DEGIMA LINPACK Energy Performance for LINPACK Benchmark on DEGIMA 1 AMD/ATI Radeon HD 5870 GPU DEGIMA LINPACK HD 5870 GPU DEGIMA LINPACK 1.4698 GFlops/Watt 1.9658 GFlops/Watt Abstract GPU Computing has

More information

Ver. 3.8 Ver NOTE E v3 2.4GHz, 20M cache, 8.00GT/s QPI,, HT, 8C/16T 85W E v3 1.6GHz, 15M cache, 6.40GT/s QPI,

Ver. 3.8 Ver NOTE E v3 2.4GHz, 20M cache, 8.00GT/s QPI,, HT, 8C/16T 85W E v3 1.6GHz, 15M cache, 6.40GT/s QPI, PowerEdge T630 Contents RAID /RAID & PCIe GPU OS v3.8 Apr. 2017 P3-5 P6 P7 P8-9 P10-11 P12-16 P17-79 P80-85 P86-87 P88-90 P90 P91-92 P93-96 P97-100 P101-107 P107-108 P109-110 2017 4 28 2016 4 22 Ver. 3.8

More information

HPE Moonshot System ~ビッグデータ分析&モバイルワークプレイスを新たなステージへ~

HPE Moonshot System ~ビッグデータ分析&モバイルワークプレイスを新たなステージへ~ Brochure HPE Moonshot System HPE Moonshot System 4.3U 45 HPE Moonshot System Xeon & HPE Moonshot System HPE Moonshot System HPE HPE Moonshot System &IoT & SoC Xeon D-1500 Broadwell-DE HPE ProLiant m510

More information

HPC可視化_小野2.pptx

HPC可視化_小野2.pptx 大 小 二 生 高 方 目 大 方 方 方 Rank Site Processors RMax Processor System Model 1 DOE/NNSA/LANL 122400 1026000 PowerXCell 8i BladeCenter QS22 Cluster 2 DOE/NNSA/LLNL 212992 478200 PowerPC 440 BlueGene/L 3 Argonne

More information

211 年ハイパフォーマンスコンピューティングと計算科学シンポジウム Computing Symposium 211 HPCS /1/18 a a 1 a 2 a 3 a a GPU Graphics Processing Unit GPU CPU GPU GPGPU G

211 年ハイパフォーマンスコンピューティングと計算科学シンポジウム Computing Symposium 211 HPCS /1/18 a a 1 a 2 a 3 a a GPU Graphics Processing Unit GPU CPU GPU GPGPU G 211 年ハイパフォーマンスコンピューティングと計算科学シンポジウム Computing Symposium 211 HPCS211 211/1/18 GPU 4 8 BLAS 4 8 BLAS Basic Linear Algebra Subprograms GPU Graphics Processing Unit 4 8 double 2 4 double-double DD 4 4 8 quad-double

More information

B 2 Thin Q=3 0 0 P= N ( )P Q = 2 3 ( )6 N N TSUB- Hub PCI-Express (PCIe) Gen 2 x8 AME1 5) 3 GPU Socket 0 High-performance Linpack 1

B 2 Thin Q=3 0 0 P= N ( )P Q = 2 3 ( )6 N N TSUB- Hub PCI-Express (PCIe) Gen 2 x8 AME1 5) 3 GPU Socket 0 High-performance Linpack 1 TSUBAME 2.0 Linpack 1,,,, Intel NVIDIA GPU 2010 11 TSUBAME 2.0 Linpack 2CPU 3GPU 1400 Dual-Rail QDR InfiniBand TSUBAME 1.0 30 2.4PFlops TSUBAME 1.0 Linpack GPU 1.192PFlops PFlops Top500 4 Achievement of

More information

XACCの概要

XACCの概要 2 global void kernel(int a[max], int llimit, int ulimit) {... } : int main(int argc, char *argv[]){ MPI_Int(&argc, &argc); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); dx

More information

Microsoft PowerPoint - sales2.ppt

Microsoft PowerPoint - sales2.ppt 最適化とは何? CPU アーキテクチャに沿った形で最適な性能を抽出できるようにする技法 ( 性能向上技法 ) コンパイラによるプログラム最適化 コンパイラメーカの技量 経験量に依存 最適化ツールによるプログラム最適化 KAP (Kuck & Associates, Inc. ) 人によるプログラム最適化 アーキテクチャのボトルネックを知ること 3 使用コンパイラによる性能の違い MFLOPS 90

More information

rank ”«‘‚“™z‡Ì GPU ‡É‡æ‡éŁÀŠñ›»

rank ”«‘‚“™z‡Ì GPU ‡É‡æ‡éŁÀŠñ›» rank GPU ERATO 2011 11 1 1 / 26 GPU rank/select wavelet tree balanced parenthesis GPU rank 2 / 26 GPU rank/select wavelet tree balanced parenthesis GPU rank 2 / 26 GPU rank/select wavelet tree balanced

More information

橡3_2石川.PDF

橡3_2石川.PDF PC RWC 01/10/31 2 1 SCore 1,024 PC SCore III PC 01/10/31 3 SCore SCore Aug. 1995 Feb. 1996 Oct. 1996 1997-1998 Oct. 1999 Oct. 2000 April. 2001 01/10/31 4 2 SCore University of Bonn, Germany University

More information

Second-semi.PDF

Second-semi.PDF PC 2000 2 18 2 HPC Agenda PC Linux OS UNIX OS Linux Linux OS HPC 1 1CPU CPU Beowulf PC (PC) PC CPU(Pentium ) Beowulf: NASA Tomas Sterling Donald Becker 2 (PC ) Beowulf PC!! Linux Cluster (1) Level 1:

More information

HP Workstation Xeon 5600

HP Workstation Xeon 5600 HP Workstation Xeon 5600 HP 2 No.1 HP 5 3 Z 2No.1 HP :IDC's Worldwide Quarterly Workstation Tracker, 2009 Q4 14.0in Wide HP EliteBook 8440w/CT Mobile Workstation 15.6in Wide HP EliteBook 8540w Mobile Workstation

More information

it-ken_open.key

it-ken_open.key 深層学習技術の進展 ImageNet Classification 画像認識 音声認識 自然言語処理 機械翻訳 深層学習技術は これらの分野において 特に圧倒的な強みを見せている Figure (Left) Eight ILSVRC-2010 test Deep images and the cited4: from: ``ImageNet Classification with Networks et

More information

MATLAB® における並列・分散コンピューティング ~ Parallel Computing Toolbox™ & MATLAB Distributed Computing Server™ ~

MATLAB® における並列・分散コンピューティング ~ Parallel Computing Toolbox™ & MATLAB Distributed Computing Server™ ~ MATLAB における並列 分散コンピューティング ~ Parallel Computing Toolbox & MATLAB Distributed Computing Server ~ MathWorks Japan Application Engineering Group Takashi Yoshida 2016 The MathWorks, Inc. 1 System Configuration

More information

システムソリューションのご紹介

システムソリューションのご紹介 HP 2 C 製品 :VXPRO/VXSMP サーバ 製品アップデート 製品アップデート VXPRO と VXSMP での製品オプションの追加 8 ポート InfiniBand スイッチ Netlist HyperCloud メモリ VXPRO R2284 GPU サーバ 製品アップデート 8 ポート InfiniBand スイッチ IS5022 8 ポート 40G InfiniBand スイッチ

More information

Ver Ver NOTE E v3 2.4GHz, 20M cache, 8.00GT/s QPI,, HT, 8C/16T 85W E v3 1.6GHz, 15M cache, 6.40GT/s QPI

Ver Ver NOTE E v3 2.4GHz, 20M cache, 8.00GT/s QPI,, HT, 8C/16T 85W E v3 1.6GHz, 15M cache, 6.40GT/s QPI PowerEdge T630 Contents RAID /RAID & PCIe GPU OS V4.10 Mar.2018 P3-5 P6 P7 P8-9 P10-11 P12-16 P17-84 P85-90 P91-92 P93-95 P95 P96-97 P98-101 P102-105 P106-110 P110-111 P112-113 2018 3 30 2016 4 22 Ver.

More information

DO 時間積分 START 反変速度の計算 contravariant_velocity 移流項の計算 advection_adams_bashforth_2nd DO implicit loop( 陰解法 ) 速度勾配, 温度勾配の計算 gradient_cell_center_surface 速

DO 時間積分 START 反変速度の計算 contravariant_velocity 移流項の計算 advection_adams_bashforth_2nd DO implicit loop( 陰解法 ) 速度勾配, 温度勾配の計算 gradient_cell_center_surface 速 1 1, 2 1, 2 3 2, 3 4 GP LES ASUCA LES NVIDIA CUDA LES 1. Graphics Processing Unit GP General-Purpose SIMT Single Instruction Multiple Threads 1 2 3 4 1),2) LES Large Eddy Simulation 3) ASUCA 4) LES LES

More information

IPSJ SIG Technical Report Vol.2011-IOT-12 No /3/ , 6 Construction and Operation of Large Scale Web Contents Distribution Platfo

IPSJ SIG Technical Report Vol.2011-IOT-12 No /3/ , 6 Construction and Operation of Large Scale Web Contents Distribution Platfo 1 1 2 3 4 5 1 1, 6 Construction and Operation of Large Scale Web Contents Distribution Platform using Cloud Computing 1. ( ) 1 IT Web Yoshihiro Okamoto, 1 Naomi Terada and Tomohisa Akafuji, 1, 2 Yuko Okamoto,

More information

Ver. 3.9 Ver E v3 2.4GHz, 20M cache, 8.00GT/s QPI,, HT, 8C/16T 85W E v3 1.6GHz, 15M cache, 6.40GT/s QPI,, HT,

Ver. 3.9 Ver E v3 2.4GHz, 20M cache, 8.00GT/s QPI,, HT, 8C/16T 85W E v3 1.6GHz, 15M cache, 6.40GT/s QPI,, HT, PowerEdge R630 Contents RAID /RAID & PCIe OS P3-6 P7 P8 P9 P10-11 P12-16 P17-61 P62 P63-72 P73-75 P75 P76-79 P80-83 P84-90 P90-91 P92-93 V3.9 Apr. 2017 2017 4 28 2016 4 22 Ver. 3.9 Ver. 1.0 +- E5-2630

More information

GPUコンピューティング講習会パート1

GPUコンピューティング講習会パート1 GPU コンピューティング (CUDA) 講習会 GPU と GPU を用いた計算の概要 丸山直也 スケジュール 13:20-13:50 GPU を用いた計算の概要 担当丸山 13:50-14:30 GPU コンピューティングによる HPC アプリケーションの高速化の事例紹介 担当青木 14:30-14:40 休憩 14:40-17:00 CUDA プログラミングの基礎 担当丸山 TSUBAME の

More information

フカシギおねえさん問題の高速計算アルゴリズム

フカシギおねえさん問題の高速計算アルゴリズム JST ERATO 2013/7/26 Joint work with 1 / 37 1 2 3 4 5 6 2 / 37 1 2 3 4 5 6 3 / 37 : 4 / 37 9 9 6 10 10 25 5 / 37 9 9 6 10 10 25 Bousquet-Mélou (2005) 19 19 3 1GHz Alpha 8 Iwashita (Sep 2012) 21 21 3 2.67GHz

More information

Microsoft Word - HOKUSAI_system_overview_ja.docx

Microsoft Word - HOKUSAI_system_overview_ja.docx HOKUSAI システムの概要 1.1 システム構成 HOKUSAI システムは 超並列演算システム (GWMPC BWMPC) アプリケーション演算サーバ群 ( 大容量メモリ演算サーバ GPU 演算サーバ ) と システムの利用入口となるフロントエンドサーバ 用途の異なる 2 つのストレージ ( オンライン ストレージ 階層型ストレージ ) から構成されるシステムです 図 0-1 システム構成図

More information

21 20 20413525 22 2 4 i 1 1 2 4 2.1.................................. 4 2.1.1 LinuxOS....................... 7 2.1.2....................... 10 2.2........................ 15 3 17 3.1.................................

More information

1重谷.PDF

1重谷.PDF RSCC RSCC RSCC BMT 1 6 3 3000 3000 200310 1994 19942 VPP500/32PE 19992 VPP700E/128PE 160PE 20043 2 2 PC Linux 2048 CPU Intel Xeon 3.06GHzDual) 12.5 TFLOPS SX-7 32CPU/256GB 282.5 GFLOPS Linux 3 PC 1999

More information

IEEE HDD RAID MPI MPU/CPU GPGPU GPU cm I m cm /g I I n/ cm 2 s X n/ cm s cm g/cm

IEEE HDD RAID MPI MPU/CPU GPGPU GPU cm I m cm /g I I n/ cm 2 s X n/ cm s cm g/cm Neutron Visual Sensing Techniques Making Good Use of Computer Science J-PARC CT CT-PET TB IEEE HDD RAID MPI MPU/CPU GPGPU GPU cm I m cm /g I I n/ cm 2 s X n/ cm s cm g/cm cm cm barn cm thn/ cm s n/ cm

More information

26102 (1/2) LSISoC: (1) (*) (*) GPU SIMD MIMD FPGA DES, AES (2/2) (2) FPGA(8bit) (ISS: Instruction Set Simulator) (3) (4) LSI ECU110100ECU1 ECU ECU ECU ECU FPGA ECU main() { int i, j, k for { } 1 GP-GPU

More information

258 5) GPS 1 GPS 6) GPS DP 7) 8) 10) GPS GPS 2 3 4 5 2. 2.1 3 1) GPS Global Positioning System

258 5) GPS 1 GPS 6) GPS DP 7) 8) 10) GPS GPS 2 3 4 5 2. 2.1 3 1) GPS Global Positioning System Vol. 52 No. 1 257 268 (Jan. 2011) 1 2, 1 1 measurement. In this paper, a dynamic road map making system is proposed. The proposition system uses probe-cars which has an in-vehicle camera and a GPS receiver.

More information

Microsoft PowerPoint - GPU_computing_2013_01.pptx

Microsoft PowerPoint - GPU_computing_2013_01.pptx GPU コンピューティン No.1 導入 東京工業大学 学術国際情報センター 青木尊之 1 GPU とは 2 GPGPU (General-purpose computing on graphics processing units) GPU を画像処理以外の一般的計算に使う GPU の魅力 高性能 : ハイエンド GPU はピーク 4 TFLOPS 超 手軽さ : 普通の PC にも装着できる 低価格

More information

Vol.214-HPC-145 No /7/3 C #pragma acc directive-name [clause [[,] clause] ] new-line structured block Fortran!$acc directive-name [clause [[,] c

Vol.214-HPC-145 No /7/3 C #pragma acc directive-name [clause [[,] clause] ] new-line structured block Fortran!$acc directive-name [clause [[,] c Vol.214-HPC-145 No.45 214/7/3 OpenACC 1 3,1,2 1,2 GPU CUDA OpenCL OpenACC OpenACC High-level OpenACC CPU Intex Xeon Phi K2X GPU Intel Xeon Phi 27% K2X GPU 24% 1. TSUBAME2.5 CPU GPU CUDA OpenCL CPU OpenMP

More information

09中西

09中西 PC NEC Linux (1) (2) (1) (2) 1 Linux Linux 2002.11.22) LLNL Linux Intel Xeon 2300 ASCIWhite1/7 / HPC (IDC) 2002 800 2005 2004 HPC 80%Linux) Linux ASCI Purple (ASCI 100TFlops Blue Gene/L 1PFlops (2005)

More information

単位、情報量、デジタルデータ、CPUと高速化 ~ICT用語集~

単位、情報量、デジタルデータ、CPUと高速化  ~ICT用語集~ CPU ICT mizutani@ic.daito.ac.jp 2014 SI: Systèm International d Unités SI SI 10 1 da 10 1 d 10 2 h 10 2 c 10 3 k 10 3 m 10 6 M 10 6 µ 10 9 G 10 9 n 10 12 T 10 12 p 10 15 P 10 15 f 10 18 E 10 18 a 10 21

More information

JAPAN MARKETING JOURNAL 111 Vol.28 No.32008

JAPAN MARKETING JOURNAL 111 Vol.28 No.32008 Vol.28 No.32008 JAPAN MARKETING JOURNAL 111 Vol.28 No.32008 JAPAN MARKETING JOURNAL 111 Vol.28 No.32008 JAPAN MARKETING JOURNAL 111 Vol.28 No.32008 JAPAN MARKETING JOURNAL 111 Vol.28 No.32008 JAPAN MARKETING

More information

JAPAN MARKETING JOURNAL 113 Vol.29 No.12009

JAPAN MARKETING JOURNAL 113 Vol.29 No.12009 JAPAN MARKETING JOURNAL 113 Vol.29 No.12009 JAPAN MARKETING JOURNAL 113 Vol.29 No.12009 JAPAN MARKETING JOURNAL 113 Vol.29 No.12009 JAPAN MARKETING JOURNAL 113 Vol.29 No.12009 Vol.29 No.12009 JAPAN MARKETING

More information

IPSJ SIG Technical Report Vol.2013-ARC-203 No /2/1 SMYLE OpenCL (NEDO) IT FPGA SMYLEref SMYLE OpenCL SMYLE OpenCL FPGA 1

IPSJ SIG Technical Report Vol.2013-ARC-203 No /2/1 SMYLE OpenCL (NEDO) IT FPGA SMYLEref SMYLE OpenCL SMYLE OpenCL FPGA 1 SMYLE OpenCL 128 1 1 1 1 1 2 2 3 3 3 (NEDO) IT FPGA SMYLEref SMYLE OpenCL SMYLE OpenCL FPGA 128 SMYLEref SMYLE OpenCL SMYLE OpenCL Implementation and Evaluations on 128 Cores Takuji Hieda 1 Noriko Etani

More information

JAPAN MARKETING JOURNAL 110 Vol.28 No.22008

JAPAN MARKETING JOURNAL 110 Vol.28 No.22008 Vol.28 No.22008 JAPAN MARKETING JOURNAL 110 Vol.28 No.22008 JAPAN MARKETING JOURNAL 110 Vol.28 No.22008 JAPAN MARKETING JOURNAL 110 Vol.28 No.22008 JAPAN MARKETING JOURNAL 110 Vol.28 No.22008 JAPAN MARKETING

More information

[4] ACP (Advanced Communication Primitives) [1] ACP ACP [2] ACP Tofu UDP [3] HPC InfiniBand InfiniBand ACP 2 ACP, 3 InfiniBand ACP 4 5 ACP 2. ACP ACP

[4] ACP (Advanced Communication Primitives) [1] ACP ACP [2] ACP Tofu UDP [3] HPC InfiniBand InfiniBand ACP 2 ACP, 3 InfiniBand ACP 4 5 ACP 2. ACP ACP InfiniBand ACP 1,5,a) 1,5,b) 2,5 1,5 4,5 3,5 2,5 ACE (Advanced Communication for Exa) ACP (Advanced Communication Primitives) HPC InfiniBand ACP InfiniBand ACP ACP InfiniBand Open MPI 20% InfiniBand Implementation

More information

FINAL PROGRAM 25th Annual Workshop SWoPP / / 2012 Tottori Summer United Workshops on Parallel, Distributed, and Cooperative Processing 2012

FINAL PROGRAM 25th Annual Workshop SWoPP / / 2012 Tottori Summer United Workshops on Parallel, Distributed, and Cooperative Processing 2012 FINAL PROGRAM 25th Annual Workshop SWoPP 2012 2012 / / 2012 Tottori Summer United Workshops on Parallel, Distributed, and Cooperative Processing 2012 8 1 ( ) 8 3 ( ) 680-0017 101-5 http://www.torikenmin.jp/kenbun/

More information

福岡大学人文論叢47-3

福岡大学人文論叢47-3 679 pp. 1 680 2 681 pp. 3 682 4 683 5 684 pp. 6 685 7 686 8 687 9 688 pp. b 10 689 11 690 12 691 13 692 pp. 14 693 15 694 a b 16 695 a b 17 696 a 18 697 B 19 698 A B B B A B B A A 20 699 pp. 21 700 pp.

More information

Ver. 3.8 Ver NOTE E v3 2.4GHz, 20M cache, 8.00GT/s QPI,, HT, 8C/16T 85W E v3 1.6GHz, 15M cache, 6.40GT/s QPI,,

Ver. 3.8 Ver NOTE E v3 2.4GHz, 20M cache, 8.00GT/s QPI,, HT, 8C/16T 85W E v3 1.6GHz, 15M cache, 6.40GT/s QPI,, PowerEdge R930 Contents RAID /RAID & P3-5 P6 P7 P7 P8-P9 P10-13 P14-57 P58 PCIe P59-71 P72-73 P74-77 P78-81 OS P82-88 P88-89 P90-91 V3.8 Apr. 2017 2017 4 28 2016 4 22 Ver. 3.8 Ver. 1.0 +- NOTE E5-2630

More information

Ver. 3.8 Ver NOTE E v3 2.4GHz, 20M cache, 8.00GT/s QPI,, HT, 8C/16T 85W E v3 1.6GHz, 15M cache, 6.40GT/s QPI,,

Ver. 3.8 Ver NOTE E v3 2.4GHz, 20M cache, 8.00GT/s QPI,, HT, 8C/16T 85W E v3 1.6GHz, 15M cache, 6.40GT/s QPI,, PowerEdge R730 Contents RAID /RAID & PCIe GPU OS P3-5 P6 P7 P8 P9-10 P11-16 P17-55 P56 P57-66 P67-69 P70-72 P72 P73 P74-77 P78-81 P82-88 P88-89 P90-91 V3.8 Apr. 2017 2017 4 28 2016 4 22 Ver. 3.8 Ver. 1.0

More information

HPC (pay-as-you-go) HPC Web 2

HPC (pay-as-you-go) HPC Web 2 ,, 1 HPC (pay-as-you-go) HPC Web 2 HPC Amazon EC2 OpenFOAM GPU EC2 3 HPC MPI MPI Courant 1 GPGPU MPI 4 AMAZON EC2 GPU CLUSTER COMPUTE INSTANCE EC2 GPU (cg1.4xlarge) ( N. Virgina ) Quadcore Intel Xeon 5570

More information

1 2 4 5 9 10 12 3 6 11 13 14 0 8 7 15 Iteration 0 Iteration 1 1 Iteration 2 Iteration 3 N N N! N 1 MOPT(Merge Optimization) 3) MOPT 8192 2 16384 5 MOP

1 2 4 5 9 10 12 3 6 11 13 14 0 8 7 15 Iteration 0 Iteration 1 1 Iteration 2 Iteration 3 N N N! N 1 MOPT(Merge Optimization) 3) MOPT 8192 2 16384 5 MOP 10000 SFMOPT / / MOPT(Merge OPTimization) MOPT FMOPT(Fast MOPT) FMOPT SFMOPT(Subgrouping FMOPT) SFMOPT 2 8192 31 The Proposal and Evaluation of SFMOPT, a Task Mapping Method for 10000 Tasks Haruka Asano

More information

2012年度HPCサマーセミナー_多田野.pptx

2012年度HPCサマーセミナー_多田野.pptx ! CCS HPC! I " tadano@cs.tsukuba.ac.jp" " 1 " " " " " " " 2 3 " " Ax = b" " " 4 Ax = b" A = a 11 a 12... a 1n a 21 a 22... a 2n...... a n1 a n2... a nn, x = x 1 x 2. x n, b = b 1 b 2. b n " " 5 Gauss LU

More information

Vol. 46, No. SIG 12(ACS 11), pp , August c MegaScript, MegaScript MegaScript MegaScript MegaScript Construction of Accurate Task

Vol. 46, No. SIG 12(ACS 11), pp , August c MegaScript, MegaScript MegaScript MegaScript MegaScript Construction of Accurate Task Vol. 46, No. SIG 12(ACS 11), pp. 181 193, August 2005. c 2005 1 MegaScript, MegaScript MegaScript MegaScript MegaScript Construction of Accurate Task Models for the MegaScript Task Parallel Language Hiroshi

More information

次世代スーパーコンピュータのシステム構成案について

次世代スーパーコンピュータのシステム構成案について 6 19 4 27 1. 2. 3. 3.1 3.2 A 3.3 B 4. 5. 2007/4/27 4 1 1. 2007/4/27 4 2 NEC NHF2 18 9 19 19 2 28 10PFLOPS2.5PB 30MW 3,200 18 12 12 SimFold, GAMESS, Modylas, RSDFT, NICAM, LatticeQCD, LANS HPL, NPB-FT 19

More information

untitled

untitled taisuke@cs.tsukuba.ac.jp http://www.hpcs.is.tsukuba.ac.jp/~taisuke/ CP-PACS HPC PC post CP-PACS CP-PACS II 1990 HPC RWCP, HPC かつての世界最高速計算機も 1996年11月のTOP500 第一位 ピーク性能 614 GFLOPS Linpack性能 368 GFLOPS (地球シミュレータの前

More information

ProLiant ML110 Generation 4 システム構成図

ProLiant ML110 Generation 4 システム構成図 HP ProLiant ML110 Generation 5 2010 4 16 1 OVERVIEW ProLiant ML110 Generation 5 ProLiant ML110 Generation 5 1, 2 LED LED ( ) ( ) ( ) Lights-Out 100c ( ) 2 3 6 USB SATA ML110 G5 ProLiant ML110 G5 SATA /

More information

xx/xx Vol. Jxx A No. xx 1 Fig. 1 PAL(Panoramic Annular Lens) PAL(Panoramic Annular Lens) PAL (2) PAL PAL 2 PAL 3 2 PAL 1 PAL 3 PAL PAL 2. 1 PAL

xx/xx Vol. Jxx A No. xx 1 Fig. 1 PAL(Panoramic Annular Lens) PAL(Panoramic Annular Lens) PAL (2) PAL PAL 2 PAL 3 2 PAL 1 PAL 3 PAL PAL 2. 1 PAL PAL On the Precision of 3D Measurement by Stereo PAL Images Hiroyuki HASE,HirofumiKAWAI,FrankEKPAR, Masaaki YONEDA,andJien KATO PAL 3 PAL Panoramic Annular Lens 1985 Greguss PAL 1 PAL PAL 2 3 2 PAL DP

More information

情報処理学会研究報告 IPSJ SIG Technical Report Vol.2013-HPC-139 No /5/29 Gfarm/Pwrake NICT NICT 10TB 100TB CPU I/O HPC I/O NICT Gf

情報処理学会研究報告 IPSJ SIG Technical Report Vol.2013-HPC-139 No /5/29 Gfarm/Pwrake NICT NICT 10TB 100TB CPU I/O HPC I/O NICT Gf Gfarm/Pwrake NICT 1 1 1 1 2 2 3 4 5 5 5 6 NICT 10TB 100TB CPU I/O HPC I/O NICT Gfarm Gfarm Pwrake A Parallel Processing Technique on the NICT Science Cloud via Gfarm/Pwrake KEN T. MURATA 1 HIDENOBU WATANABE

More information

HP ProLiant 500シリーズ

HP ProLiant 500シリーズ HPProLiant5 DL58/585 HPProLiant5 4 HPProLiant5 HPProLiant5 64 HPProLiant5 TPC-H@1GB 4, 34,99 SAP SD Benchmark Users QphH@1GB 3, 2, 1, 4, 3, 2, 1, DL58 G5, Xeon X735 DL585 G5, AMD Opteron 836SE 17,12 DL58

More information

1 Kinect for Windows M = [X Y Z] T M = [X Y Z ] T f (u,v) w 3.2 [11] [7] u = f X +u Z 0 δ u (X,Y,Z ) (5) v = f Y Z +v 0 δ v (X,Y,Z ) (6) w = Z +

1 Kinect for Windows M = [X Y Z] T M = [X Y Z ] T f (u,v) w 3.2 [11] [7] u = f X +u Z 0 δ u (X,Y,Z ) (5) v = f Y Z +v 0 δ v (X,Y,Z ) (6) w = Z + 3 3D 1,a) 1 1 Kinect (X, Y) 3D 3D 1. 2010 Microsoft Kinect for Windows SDK( (Kinect) SDK ) 3D [1], [2] [3] [4] [5] [10] 30fps [10] 3 Kinect 3 Kinect Kinect for Windows SDK 3 Microsoft 3 Kinect for Windows

More information

AV 1000 BASE-T LAN 90 IEEE ac USB (3 ) LAN (IEEE 802.1X ) LAN AWS (Amazon Web Services) AP 3 USB wget iperf3 wget 40 MBytes 2 wget 40 MByt

AV 1000 BASE-T LAN 90 IEEE ac USB (3 ) LAN (IEEE 802.1X ) LAN AWS (Amazon Web Services) AP 3 USB wget iperf3 wget 40 MBytes 2 wget 40 MByt 1 BYOD LAN 1 2 3 4 1 BYOD 1 Gb/s LAN BYOD LAN LAN Access Point (AP) IEEE 802.11n BYOD LAN AP wget iperf3 1 AP [2] 2 IEEE 802.11ac [3] AP 4 AV (207 m 2 ) ( 1 2 )[4, 5] AP Wave2 Aruba AP-335 Aruba LAN 7210

More information

01_20.eps

01_20.eps 3rd International Workshop on Japan Association for Food Function Clinical Research program Udayana Univ. / JAFCAR Joint Workshop 2009 in Bali 3rd International Workshop on Japan Association for Food Function

More information

PowerPoint プレゼンテーション

PowerPoint プレゼンテーション オープンソース カンファレンス 2017 OSAKA ライトニング トーク あのクラウドと比べてみたよ IBM クラウドのリアルベンチマーク 2017 年 1 月 28 日 日本アイ ビー エム株式会社 クラウド事業統括 クラウドエバンジェリスト 安田智有 @ytomoari tomoari.yasuda 話 日本 IBM クラウドマイスター 安田智有 1 お客様の よしやってみるか を応援してきました

More information

main.dvi

main.dvi PC 1 1 [1][2] [3][4] ( ) GPU(Graphics Processing Unit) GPU PC GPU PC ( 2 GPU ) GPU Harris Corner Detector[5] CPU ( ) ( ) CPU GPU 2 3 GPU 4 5 6 7 1 toyohiro@isc.kyutech.ac.jp 45 2 ( ) CPU ( ) ( ) () 2.1

More information

develop

develop SCore SCore 02/03/20 2 1 HA (High Availability) HPC (High Performance Computing) 02/03/20 3 HA (High Availability) Mail/Web/News/File Server HPC (High Performance Computing) Job Dispatching( ) Parallel

More information

IPSJ SIG Technical Report Vol.2015-HPC-150 No /8/6 I/O Jianwei Liao 1 Gerofi Balazs 1 1 Guo-Yuan Lien Prototyping F

IPSJ SIG Technical Report Vol.2015-HPC-150 No /8/6 I/O Jianwei Liao 1 Gerofi Balazs 1 1 Guo-Yuan Lien Prototyping F I/O Jianwei Liao 1 Gerofi Balazs 1 1 Guo-Yuan Lien 1 1 1 1 1 30 30 100 30 30 2 Prototyping File I/O Arbitrator Middleware for Real-Time Severe Weather Prediction System Jianwei Liao 1 Gerofi Balazs 1 Yutaka

More information

[1] [2] [3] (RTT) 2. Android OS Android OS Google OS 69.7% [4] 1 Android Linux [5] Linux OS Android Runtime Dalvik Dalvik UI Application(Home,T

[1] [2] [3] (RTT) 2. Android OS Android OS Google OS 69.7% [4] 1 Android Linux [5] Linux OS Android Runtime Dalvik Dalvik UI Application(Home,T LAN Android Transmission-Control Middleware on multiple Android Terminals in a WLAN Environment with consideration of Round Trip Time Ai HAYAKAWA, Saneyasu YAMAGUCHI, and Masato OGUCHI Ochanomizu University

More information

1 Web DTN DTN 2. 2 DTN DTN Epidemic [5] Spray and Wait [6] DTN Android Twitter [7] 2 2 DTN 10km 50m % %Epidemic 99% 13.4% 10km DTN [8] 2

1 Web DTN DTN 2. 2 DTN DTN Epidemic [5] Spray and Wait [6] DTN Android Twitter [7] 2 2 DTN 10km 50m % %Epidemic 99% 13.4% 10km DTN [8] 2 DEIM Forum 2014 E7-1 Web DTN 112 8610 2-1-1 UCLA Computer Science Department 3803 Boelter Hall, Los Angeles, CA 90095-1596, USA E-mail: yuka@ogl.is.ocha.ac.jp, mineo@cs.ucla.edu, oguchi@computer.org Web

More information

GPGPU

GPGPU GPGPU 2013 1008 2015 1 23 Abstract In recent years, with the advance of microscope technology, the alive cells have been able to observe. On the other hand, from the standpoint of image processing, the

More information

TSUBAME2.0 における GPU の 活用方法 東京工業大学学術国際情報センター丸山直也第 10 回 GPU コンピューティング講習会 2011 年 9 月 28 日

TSUBAME2.0 における GPU の 活用方法 東京工業大学学術国際情報センター丸山直也第 10 回 GPU コンピューティング講習会 2011 年 9 月 28 日 TSUBAME2.0 における GPU の 活用方法 東京工業大学学術国際情報センター丸山直也第 10 回 GPU コンピューティング講習会 2011 年 9 月 28 日 目次 1. TSUBAMEのGPU 環境 2. プログラム作成 3. プログラム実行 4. 性能解析 デバッグ サンプルコードは /work0/gsic/seminars/gpu- 2011-09- 28 からコピー可能です 1.

More information

Microsoft PowerPoint - RBU-introduction-J.pptx

Microsoft PowerPoint - RBU-introduction-J.pptx Reedbush-U の概要 ログイン方法 東京大学情報基盤センタースーパーコンピューティング研究部門 http://www.cc.u-tokyo.ac.jp/ 東大センターのスパコン 2 基の大型システム,6 年サイクル (?) FY 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 Hitachi SR11K/J2 IBM Power 5+ 18.8TFLOPS,

More information

untitled

untitled c NUMA 1. 18 (Moore s law) 1Hz CPU 2. 1 (Register) (RAM) Level 1 (L1) L2 L3 L4 TLB (translation look-aside buffer) (OS) TLB TLB 3. NUMA NUMA (Non-uniform memory access) 819 0395 744 1 2014 10 Copyright

More information

5988_4096JA.qxd

5988_4096JA.qxd Agilent Infiniium 89601A Product Note Agilent Infiniium 1.......................................................................... 3 1.1 89601A VSA................................................... 3

More information

imai@eng.kagawa-u.ac.jp No1 No2 OS Wintel Intel x86 CPU No3 No4 8bit=2 8 =256(Byte) 16bit=2 16 =65,536(Byte)=64KB= 6 5 32bit=2 32 =4,294,967,296(Byte)=4GB= 43 64bit=2 64 =18,446,744,073,709,551,615(Byte)=16EB

More information

IPSJ SIG Technical Report Vol.2012-HCI-149 No /7/20 1 1,2 1 (HMD: Head Mounted Display) HMD HMD,,,, An Information Presentation Method for Weara

IPSJ SIG Technical Report Vol.2012-HCI-149 No /7/20 1 1,2 1 (HMD: Head Mounted Display) HMD HMD,,,, An Information Presentation Method for Weara 1 1,2 1 (: Head Mounted Display),,,, An Information Presentation Method for Wearable Displays Considering Surrounding Conditions in Wearable Computing Environments Masayuki Nakao 1 Tsutomu Terada 1,2 Masahiko

More information

2009 4

2009 4 2009 4 LU QR Cholesky A: n n A : A = IEEE 754 10 100 = : 1 / 36 A A κ(a) := A A 1. = κ(a) = Ax = b x := A 1 b Ay = b + b y := A 1 (b + b) x = y x x x κ(a) b b 2 / 36 IEEE 754 = 1 : u 1.11 10 16 = 10 16

More information

Input image Initialize variables Loop for period of oscillation Update height map Make shade image Change property of image Output image Change time L

Input image Initialize variables Loop for period of oscillation Update height map Make shade image Change property of image Output image Change time L 1,a) 1,b) 1/f β Generation Method of Animation from Pictures with Natural Flicker Abstract: Some methods to create animation automatically from one picture have been proposed. There is a method that gives

More information

SWoPP BOF BOF-1 8/3 19:10 BoF SWoPP : BOF-2 8/5 17:00 19:00 HW/SW 15 x5 SimMips/MieruPC M-Core/SimMc FPGA S

SWoPP BOF BOF-1 8/3 19:10 BoF SWoPP :   BOF-2 8/5 17:00 19:00 HW/SW 15 x5 SimMips/MieruPC M-Core/SimMc FPGA S FINAL PROGRAM 23rd Annual Workshop SWoPP 2010 2010 / / 2010 Kanazawa Summer United Workshops on Parallel, Distributed, and Cooperative Processing 2010 8 3 ( ) 8 5 ( ) 920-0864 15 1 http://www.bunka-h.gr.jp/

More information

,., ping - RTT,., [2],RTT TCP [3] [4] Android.Android,.,,. LAN ACK. [5].. 3., 1.,. 3 AI.,,Amazon, (NN),, 1..NN,, (RNN) RNN

,., ping - RTT,., [2],RTT TCP [3] [4] Android.Android,.,,. LAN ACK. [5].. 3., 1.,. 3 AI.,,Amazon, (NN),, 1..NN,, (RNN) RNN DEIM Forum 2018 F1-1 LAN LSTM 112 8610 2-1-1 163-8677 1-24-2 E-mail: aoi@ogl.is.ocha.ac.jp, oguchi@is.ocha.ac.jp, sane@cc.kogakuin.ac.jp,,.,,., LAN,. Android LAN,. LSTM LAN., LSTM, Analysis of Packet of

More information

2ndD3.eps

2ndD3.eps CUDA GPGPU 2012 UDX 12/5/24 p. 1 FDTD GPU FDTD GPU FDTD FDTD FDTD PGI Acceralator CUDA OpenMP Fermi GPU (Tesla C2075/C2070, GTX 580) GT200 GPU (Tesla C1060, GTX 285) PC GPGPU 2012 UDX 12/5/24 p. 2 FDTD

More information