B 2 Thin Q=3 0 0 P= N ( )P Q = 2 3 ( )6 N N TSUB- Hub PCI-Express (PCIe) Gen 2 x8 AME1 5) 3 GPU Socket 0 High-performance Linpack 1

Size: px
Start display at page:

Download "B 2 Thin Q=3 0 0 P= N ( )P Q = 2 3 ( )6 N N TSUB- Hub PCI-Express (PCIe) Gen 2 x8 AME1 5) 3 GPU Socket 0 High-performance Linpack 1"

Transcription

1 TSUBAME 2.0 Linpack 1,,,, Intel NVIDIA GPU TSUBAME 2.0 Linpack 2CPU 3GPU 1400 Dual-Rail QDR InfiniBand TSUBAME PFlops TSUBAME 1.0 Linpack GPU 1.192PFlops PFlops Top500 4 Achievement of Linpack Performance of over 1PFlops on TSUBAME 2.0 Supercomputer Toshio Endo,, Akira Nukada, and Satoshi Matsuoka,, We report Linpack benchmark results on the TSUBAME 2.0 supercomputer, a large scale heterogeneous system with Intel processors and NVIDIA GPUs, operation of which has started in November The main part of this system consists of about 1400 compute nodes, each of which is equipped with two CPUs and three GPUs. The nodes are connected via full bisection fat tree network of Dual-Rail QDR InfiniBand. The theoretical peak performance reaches 2.4PFlops, 30 times larger than that of the predecessor TSUBAME 1.0, while its power consumption is similar to TSUBAME 1.0. We conducted improvement and tuning of Linpack benchmark considering characteristics of large scale systems with GPUs, and achieved Linpack performance of 1.192PFlops. This is the first result that exceeds 1PFlops in Japan, and ranked as 4th in the latest Top500 supercomputer ranking. GPU CPU 1. HPC (GPU ) 2008 Top500 2) 1PFlops HPC LANL RoadRunner Opteron 2006 CPU Sony/Toshiba/IBM PowerXCell 8i 8) TSUBAME Top500 TSUBAME 2.0 TSUBAME 2.0 TSUBAME PFlops NVIDIA GPU CPU GPU Intel Sandy-bridge NVIDIA Tesla M2050 Modular Cooling System (MCS) Tokyo Institute of Technology JST, CREST TSUBAME 2.0 Linpack National Institute of Informatics Linpack Top c 2011 Information Processing Society of Japan

2 B 2 Thin Q=3 0 0 P= N ( )P Q = 2 3 ( )6 N N TSUB- Hub PCI-Express (PCIe) Gen 2 x8 AME1 5) 3 GPU Socket 0 High-performance Linpack 10) IO hub 2 Socket 1 PCIe Gen2 x16 HCA, GPU 11) GPU PCIe MPI PCI-Express TSUBAME bit SuSE Linux 4071GPU 1.192PFlops Enterprise Server 11 Windows HPC server Top R2 Linux 1PFlops : ( PFlops) % Dual rail rail kW(Green500 1) ) 36 Voltaire GridDirector MFlops/W GridDirector 2. TSUBAME rail 6 12 TSUBAME Gbps QDR InfiniBand 7.1PBytes QDR InifiniBand 2 Dual rail ( 1) 1408 Thin 24 Medium 10 Fat Tesla M2050 GPU: NVIDIA Tesla Thin M2050 Fermi GPU 3GPU GPU (SM) 14 SM SIMD CUDA core 32 SM Thin : Hewlett-Packard 150Gbytes/s 3GB Proliant SL390s G7 6 Intel Xeon X5670 GDDR5 GPU 2.93GHz 2 NVIDIA Tesla M GFlops GPU TFlops Tesla 54GB CUDA DDR3 40Gbps QDR InfiniBand host channel adapter (HCA) 2 C 2 HCA, 3 GPU I/O 3. High performance Linpack IO Hub(IOH) 2 HCA Socket 0 CPU ( ) IO Linpack 374 c 2011 Information Processing Society of Japan

3 1 TSUBAME Thin High performance Linpack (HPL) ( k ) HPL : k L MPI LU N Flops : L P Q ( : 4) N B k ( U ) HPL NB N B B 375 c 2011 Information Processing Society of Japan

4 : U RoadRunner 8) (DTRSM) L : 1 U A k A k = A k L U TSUBAME 2.0 8GB/s (DGEMM) MPI 5 1.7TFlops HPL lookahead k + 1 k 4.2 TSUBAME 2.0 HPL HPL MPI CPU ( O(N 2 B) GPU O(N 2 (P + Q)) O(N 3 ) DGEMM/DTRSM N PCIe Linpack GPU N PCIe BLAS MPI U TSUBAME 2.0 U U 0, U 1, U A k A 0, A 1, A 2 TSUBAME 2.0 Linpack MPI (thread1) TSUBAME 1.2 5) GPU PCIe (thread2) : GPU (DGEMM) L MPI GPU TSUBAME L PCIe U 0 PCIe 2.0 GPU 92%, Xeon 8% GPU CPU GPU PCIe CPU Lin- CPU CPU pack : Linpack N N MPI 5 8% CPU MPI N TSUBAME 2.0 MPI 54GB GPU 3GPU 9GB GPU GPU 5. PCIe 5.1 TSUBAME c 2011 Information Processing Society of Japan

5 5 HPL 1 1 x PCI PCI (GFlops) (GB/s) (GB/s) x86 cluster RoadRunner TSUBAME TSUBAME TSUBAME 2.0 HPL 1 Socket 0 Socket 1 SUSE Linux Enterprise 11, Open- ( 1, 2) Socket 1 MPI 1.4.2, GCC 4.3 CUDA 3.1 BLAS Socket 0 Xeon GotoBLAS2 Linpack ) Tesla GPU NVIDIA DGEMM/DTRSM 6) NVIDIA BLAS CUBLAS MPI GPU Xeon TurboBoost : MPI GPU 3 DGEMM : Linpack (=GPU ) B CPU GPU PCI -PCIe 3 GPU DGEMM( Socket 0 CPU 1 Socket ) 7 1GPU 1 CPU 2 NVIDIA GPU DGEMM first touch PCIe CPU Linpack 377 c 2011 Information Processing Society of Japan

6 7 M2050 1GPU NVIDIA 8 ( M B) (B M) M2050 1GPU CUBLAS (M B) (B M) (M B) (B M) 5% B, M Linpack 350GFlops PCIe B TSUBAME2.0 Linpack B B (=GPU ) 4071 Linpack P Q = B = 1024 N = 2, 490, 368, B = 1024 DGEMM : 7 350GFlops M2050 GPU 43, , GFlops 1 (3 ) 35.4GB PCIe 360GFlops S1070 GPU PFlops GFlops DGEMM 80GFlops GFlops 1PFlops NVIDIA TSUBAME M2050 Fermi GPU %(=386GFlops) Top500 DGEMM 4 Tianhe-1A 3 NVIDIA Nebulae GPU CUBLAS PFlops Linpack 5.2 Linpack 52.1% TSUBAME 2.0 TSUBAME Linpack TSUB- 9 AME % 35GB DGEMM TSUBAME 2.0, TSUBAME 1.2 TSUB- 880GFlops AME 1.2 Opteron CPU c 2011 Information Processing Society of Japan

7 9 256 Linpack 10 TSUBAME 2.0, TSUBAME 1.2, TSUBAME TSUBAME 1.2 Opteron (CPU ) Linpack ClearSpeed Tesla S1070 GPU 2GPU TSUBAME Linpack 2.0 TSUBAME 2.0 ( ) 90 5) 100% 1% Linpack / Linpack Elem-DGEMM CPU DGEMM MCS Node-DGEMM CPU DGEMM Linpack (TSUBAME 1.2 TSUBAME Linpack 2.0 GPU ) PCI 1440kW Elem-DGEMM Node-DGEMM TSUBAME 1.2 TSUBAME 2.0 Linpack Green500 1) kW TSUBAME 1.2 Node-DGEMM Green500 Linpack Linpack 20% Linpack 21.3% 4 U ( 36kW TSUBAME 2.0 Peak ) Green500 Elem-DGEMM 5.1 Fermi GPU DGEMM 958MFlops/W Green500 2 Elem-DGEMM Node-DGEMM, Linpack the Greenest Production Supercomputer in the World 5.4 GRAPE-DR c 2011 Information Processing Society of Japan

8 2) TOP500 supercomputer sites TSUBAME 2.0 3) Cedric Augonnet, Samuel Thibault, Raymond Linpack Namyst, and Pierre-Andre Wacrenier. StarPU: 1.192PFlops 958MFlops/W A unified platform for task scheduling on heterogeneous multicore architectures. In Proceedings of International Euro-Par Conference on Parallel Processing, pages , TSUBAME 2.0 4) G. Bosilca, A. Bouteiller, A. Danalis, M. Faverge, A. Haidar, T. Herault, J. Kurzak, J. Langou, P. Lemarinier, H. Ltaief, P. Luszczek, GPU CPU A. Yarkhan, and J. Dongarra. Distibuted dense MPI numerical linear algebra algorithms on massively parallel architectures: DPLASMA. Tech- CPU/GPU / nical Report UT-CS , University of Tennessee Computer Science, MPI CUDA 5) Toshio Endo, Akira Nukada, Satoshi Matsuoka, and Naoya Maruyama. Linpack eval- uation on a supercomputer with heterogeneous accelerators. In Proceedings of IEEE IPDPS10, ( ) page 8pages, DAG GPU 6) Massimiliano Fatica. Accelerating Linpack StarPU 3) with CUDA on heterogeneous clusters. In DPLASMA 4) Proceedings of Workshop on General-purpose Computation on Graphics Processing Units Pivoting Cholesky (GPGPU 09), Linpack pivoting 7) K. Goto and R. A. van de Geijn. Anatomy of high-performance matrix multiplication. SMPSS/MPI 9) ACM Transactions on Mathematical Software, send/recv 34(3):1 25, Linpack 8) Michael Kistler, John Gunnels, Daniel Brokenshire, and Brad Benton. Petascale com- CPU puting with accelerators. In Proceedings of ACM Symposium on Principles and Practice of Paralle Computing (PPoPP09), pages , ) Vladimir Marjanovi, Jesus Labarta, Eduard Ayguade, and Mateo Valero. Overlapping communication and computation by using a hybrid NVIDIA Voltaire DDN MPI/SMPSs approach. In Proceedings of ACM ICS 10, pages 5 16, COE 10) A. Petitet, R. C. Whaley, J. Dongarra, JST-CREST and A. Cleary. HPL - a portable implementation of the high-performance Linpack, JST-ANR benchmark for distributed-memory computers. (11) ) TSUBAME 2.0 Linpack. pages 1 6, ) The GREEN500 list. (HOKKE-18). 380 c 2011 Information Processing Society of Japan

HP High Performance Computing(HPC)

HP High Performance Computing(HPC) ACCELERATE HP High Performance Computing HPC HPC HPC HPC HPC 1000 HPHPC HPC HP HPC HPC HPC HP HPCHP HP HPC 1 HPC HP 2 HPC HPC HP ITIDC HP HPC 1HPC HPC No.1 HPC TOP500 2010 11 HP 159 32% HP HPCHP 2010 Q1-Q4

More information

1 GPU GPGPU GPU CPU 2 GPU 2007 NVIDIA GPGPU CUDA[3] GPGPU CUDA GPGPU CUDA GPGPU GPU GPU GPU Graphics Processing Unit LSI LSI CPU ( ) DRAM GPU LSI GPU

1 GPU GPGPU GPU CPU 2 GPU 2007 NVIDIA GPGPU CUDA[3] GPGPU CUDA GPGPU CUDA GPGPU GPU GPU GPU Graphics Processing Unit LSI LSI CPU ( ) DRAM GPU LSI GPU GPGPU (I) GPU GPGPU 1 GPU(Graphics Processing Unit) GPU GPGPU(General-Purpose computing on GPUs) GPU GPGPU GPU ( PC ) PC PC GPU PC PC GPU GPU 2008 TSUBAME NVIDIA GPU(Tesla S1070) TOP500 29 [1] 2009 AMD

More information

GPU n Graphics Processing Unit CG CAD

GPU n Graphics Processing Unit CG CAD GPU 2016/06/27 第 20 回 GPU コンピューティング講習会 ( 東京工業大学 ) 1 GPU n Graphics Processing Unit CG CAD www.nvidia.co.jp www.autodesk.co.jp www.pixar.com GPU n GPU ü n NVIDIA CUDA ü NVIDIA GPU ü OS Linux, Windows, Mac

More information

HPEハイパフォーマンスコンピューティング ソリューション

HPEハイパフォーマンスコンピューティング ソリューション HPE HPC / AI Page 2 No.1 * 24.8% No.1 * HPE HPC / AI HPC AI SGIHPE HPC / AI GPU TOP500 50th edition Nov. 2017 HPE No.1 124 www.top500.org HPE HPC / AI TSUBAME 3.0 2017 7 AI TSUBAME 3.0 HPE SGI 8600 System

More information

211 年ハイパフォーマンスコンピューティングと計算科学シンポジウム Computing Symposium 211 HPCS /1/18 a a 1 a 2 a 3 a a GPU Graphics Processing Unit GPU CPU GPU GPGPU G

211 年ハイパフォーマンスコンピューティングと計算科学シンポジウム Computing Symposium 211 HPCS /1/18 a a 1 a 2 a 3 a a GPU Graphics Processing Unit GPU CPU GPU GPGPU G 211 年ハイパフォーマンスコンピューティングと計算科学シンポジウム Computing Symposium 211 HPCS211 211/1/18 GPU 4 8 BLAS 4 8 BLAS Basic Linear Algebra Subprograms GPU Graphics Processing Unit 4 8 double 2 4 double-double DD 4 4 8 quad-double

More information

GPUコンピューティング講習会パート1

GPUコンピューティング講習会パート1 GPU コンピューティング (CUDA) 講習会 GPU と GPU を用いた計算の概要 丸山直也 スケジュール 13:20-13:50 GPU を用いた計算の概要 担当丸山 13:50-14:30 GPU コンピューティングによる HPC アプリケーションの高速化の事例紹介 担当青木 14:30-14:40 休憩 14:40-17:00 CUDA プログラミングの基礎 担当丸山 TSUBAME の

More information

GPGPU

GPGPU GPGPU 2013 1008 2015 1 23 Abstract In recent years, with the advance of microscope technology, the alive cells have been able to observe. On the other hand, from the standpoint of image processing, the

More information

untitled

untitled A = QΛQ T A n n Λ Q A = XΛX 1 A n n Λ X GPGPU A 3 T Q T AQ = T (Q: ) T u i = λ i u i T {λ i } {u i } QR MR 3 v i = Q u i A {v i } A n = 9000 Quad Core Xeon 2 LAPACK (4/3) n 3 O(n 2 ) O(n 3 ) A {v i }

More information

AMD/ATI Radeon HD 5870 GPU DEGIMA LINPACK HD 5870 GPU DEGIMA LINPACK GFlops/Watt GFlops/Watt Abstract GPU Computing has lately attracted

AMD/ATI Radeon HD 5870 GPU DEGIMA LINPACK HD 5870 GPU DEGIMA LINPACK GFlops/Watt GFlops/Watt Abstract GPU Computing has lately attracted DEGIMA LINPACK Energy Performance for LINPACK Benchmark on DEGIMA 1 AMD/ATI Radeon HD 5870 GPU DEGIMA LINPACK HD 5870 GPU DEGIMA LINPACK 1.4698 GFlops/Watt 1.9658 GFlops/Watt Abstract GPU Computing has

More information

09中西

09中西 PC NEC Linux (1) (2) (1) (2) 1 Linux Linux 2002.11.22) LLNL Linux Intel Xeon 2300 ASCIWhite1/7 / HPC (IDC) 2002 800 2005 2004 HPC 80%Linux) Linux ASCI Purple (ASCI 100TFlops Blue Gene/L 1PFlops (2005)

More information

SC SC10 (International Conference for High Performance Computing, Networking, Storage and Analysis) (HPC) Ernest N.

SC SC10 (International Conference for High Performance Computing, Networking, Storage and Analysis) (HPC) Ernest N. SC10 2010 11 13 19 SC10 (International Conference for High Performance Computing, Networking, Storage and Analysis) (HPC) 1 2005 8 8 2010 4 Ernest N. Morial Convention Center (ENMCC) Climate Simulation(

More information

GPUコンピューティング講習会パート1

GPUコンピューティング講習会パート1 GPU コンピューティング (CUDA) 講習会 GPU と GPU を用いた計算の概要 丸山直也 スケジュール 13:20-13:50 GPU を用いた計算の概要 担当丸山 13:50-14:30 GPU コンピューティングによる HPC アプリケーションの高速化の事例紹介 担当青木 14:30-14:40 休憩 14:40-17:00 CUDA プログラミングの基礎 担当丸山 TSUBAME の

More information

TSUBAME2.0 における GPU の 活用方法 東京工業大学学術国際情報センター丸山直也第 10 回 GPU コンピューティング講習会 2011 年 9 月 28 日

TSUBAME2.0 における GPU の 活用方法 東京工業大学学術国際情報センター丸山直也第 10 回 GPU コンピューティング講習会 2011 年 9 月 28 日 TSUBAME2.0 における GPU の 活用方法 東京工業大学学術国際情報センター丸山直也第 10 回 GPU コンピューティング講習会 2011 年 9 月 28 日 目次 1. TSUBAMEのGPU 環境 2. プログラム作成 3. プログラム実行 4. 性能解析 デバッグ サンプルコードは /work0/gsic/seminars/gpu- 2011-09- 28 からコピー可能です 1.

More information

HBase Phoenix API Mars GPU MapReduce GPU Hadoop Hadoop Hadoop MapReduce : (1) MapReduce (2)JobTracker 1 Hadoop CPU GPU Fig. 1 The overview of CPU-GPU

HBase Phoenix API Mars GPU MapReduce GPU Hadoop Hadoop Hadoop MapReduce : (1) MapReduce (2)JobTracker 1 Hadoop CPU GPU Fig. 1 The overview of CPU-GPU GPU MapReduce 1 1 1, 2, 3 MapReduce GPGPU GPU GPU MapReduce CPU GPU GPU CPU GPU CPU GPU Map K-Means CPU 2GPU CPU 1.02-1.93 Improving MapReduce Task Scheduling for CPU-GPU Heterogeneous Environments Koichi

More information

Microsoft PowerPoint - GPU_computing_2013_01.pptx

Microsoft PowerPoint - GPU_computing_2013_01.pptx GPU コンピューティン No.1 導入 東京工業大学 学術国際情報センター 青木尊之 1 GPU とは 2 GPGPU (General-purpose computing on graphics processing units) GPU を画像処理以外の一般的計算に使う GPU の魅力 高性能 : ハイエンド GPU はピーク 4 TFLOPS 超 手軽さ : 普通の PC にも装着できる 低価格

More information

FINAL PROGRAM 22th Annual Workshop SWoPP / / 2009 Sendai Summer United Workshops on Parallel, Distributed, and Cooperative Processing

FINAL PROGRAM 22th Annual Workshop SWoPP / / 2009 Sendai Summer United Workshops on Parallel, Distributed, and Cooperative Processing FINAL PROGRAM 22th Annual Workshop SWoPP 2009 2009 / / 2009 Sendai Summer United Workshops on Parallel, Distributed, and Cooperative Processing 2009 8 4 ( ) 8 6 ( ) 981-0933 1-2-45 http://www.forestsendai.jp

More information

07-二村幸孝・出口大輔.indd

07-二村幸孝・出口大輔.indd GPU Graphics Processing Units HPC High Performance Computing GPU GPGPU General-Purpose computation on GPU CPU GPU GPU *1 Intel Quad-Core Xeon E5472 3.0 GHz 2 6 MB L2 cache 1600 MHz FSB 80 GFlops 1 nvidia

More information

倍々精度RgemmのnVidia C2050上への実装と応用

倍々精度RgemmのnVidia C2050上への実装と応用 .. [email protected] http://accc.riken.jp/maho/,,, 2011/2/16 1 - : GPU : SDPA-DD 10 1 - Rgemm : 4 (32 ) nvidia C2050, GPU CPU 150, 24GFlops 25 20 GFLOPS 15 10 QuadAdd Cray, QuadMul Sloppy Kernel QuadAdd Cray,

More information

main.dvi

main.dvi PC 1 1 [1][2] [3][4] ( ) GPU(Graphics Processing Unit) GPU PC GPU PC ( 2 GPU ) GPU Harris Corner Detector[5] CPU ( ) ( ) CPU GPU 2 3 GPU 4 5 6 7 1 [email protected] 45 2 ( ) CPU ( ) ( ) () 2.1

More information

! 行行 CPUDSP PPESPECell/B.E. CPUGPU 行行 SIMD [SSE, AltiVec] 用 HPC CPUDSP PPESPE (Cell/B.E.) SPE CPUGPU GPU CPU DSP DSP PPE SPE SPE CPU DSP SPE 2

! 行行 CPUDSP PPESPECell/B.E. CPUGPU 行行 SIMD [SSE, AltiVec] 用 HPC CPUDSP PPESPE (Cell/B.E.) SPE CPUGPU GPU CPU DSP DSP PPE SPE SPE CPU DSP SPE 2 ! OpenCL [Open Computing Language] 言 [OpenCL C 言 ] CPU, GPU, Cell/B.E.,DSP 言 行行 [OpenCL Runtime] OpenCL C 言 API Khronos OpenCL Working Group AMD Broadcom Blizzard Apple ARM Codeplay Electronic Arts Freescale

More information

スーパーコンピュータ「京」の概要

スーパーコンピュータ「京」の概要 Overview of the K computer System 宮崎博行 草野義博 新庄直樹 庄司文由 横川三津夫 渡邊貞 あらまし HPCI CPUOS LINPACK 10 PFLOPSCPU 8 Abstract RIKEN and Fujitsu have been working together to develop the K computer, with the aim of beginning

More information

情報処理学会研究報告 IPSJ SIG Technical Report Vol.2013-HPC-139 No /5/29 Gfarm/Pwrake NICT NICT 10TB 100TB CPU I/O HPC I/O NICT Gf

情報処理学会研究報告 IPSJ SIG Technical Report Vol.2013-HPC-139 No /5/29 Gfarm/Pwrake NICT NICT 10TB 100TB CPU I/O HPC I/O NICT Gf Gfarm/Pwrake NICT 1 1 1 1 2 2 3 4 5 5 5 6 NICT 10TB 100TB CPU I/O HPC I/O NICT Gfarm Gfarm Pwrake A Parallel Processing Technique on the NICT Science Cloud via Gfarm/Pwrake KEN T. MURATA 1 HIDENOBU WATANABE

More information

HP Workstation 総合カタログ

HP Workstation 総合カタログ HP Workstation Z HP 6 Z HP HP Z840 Workstation P.9 HP Z640 Workstation & CPU P.10 HP Z440 Workstation P.11 17.3in WIDE HP ZBook 17 G2 Mobile Workstation P.15 15.6in WIDE HP ZBook 15 G2 Mobile Workstation

More information

[4] ACP (Advanced Communication Primitives) [1] ACP ACP [2] ACP Tofu UDP [3] HPC InfiniBand InfiniBand ACP 2 ACP, 3 InfiniBand ACP 4 5 ACP 2. ACP ACP

[4] ACP (Advanced Communication Primitives) [1] ACP ACP [2] ACP Tofu UDP [3] HPC InfiniBand InfiniBand ACP 2 ACP, 3 InfiniBand ACP 4 5 ACP 2. ACP ACP InfiniBand ACP 1,5,a) 1,5,b) 2,5 1,5 4,5 3,5 2,5 ACE (Advanced Communication for Exa) ACP (Advanced Communication Primitives) HPC InfiniBand ACP InfiniBand ACP ACP InfiniBand Open MPI 20% InfiniBand Implementation

More information

RDMAプロトコル: ネットワークパフォーマンスの向上

RDMAプロトコル: ネットワークパフォーマンスの向上 Database Acceleration Solution for HP ProLiant 2 2 3 4 I/O 5 IO 5 6 InfiniBand 6 RDMA 7 iser iscsi Extensions for RDMA 8 9 9 10 10 11 11 11 11 A : 12 B : 13 C : TCP/IP 14 15 HP 15 HP 15 15 I/OSSD Solid

More information

HP Workstation 総合カタログ

HP Workstation 総合カタログ HP Workstation E5 v2 Z Z SFF E5 v2 2 HP Windows Z 3 Performance Innovation Reliability 3 HPZ HP HP Z820 Workstation P.11 HP Z620 Workstation & CPU P.12 HP Z420 Workstation P.13 17.3in WIDE HP ZBook 17

More information

Microsoft PowerPoint - CCS学際共同boku-08b.ppt

Microsoft PowerPoint - CCS学際共同boku-08b.ppt マルチコア / マルチソケットノードに おけるメモリ性能のインパクト 研究代表者朴泰祐筑波大学システム情報工学研究科 [email protected] アウトライン 近年の高性能 PC クラスタの傾向と問題 multi-core/multi-socket ノードとメモリ性能 メモリバンド幅に着目した性能測定 multi-link network 性能評価 まとめ 近年の高性能 PC

More information

GPU GPU CPU CPU CPU GPU GPU N N CPU ( ) 1 GPU CPU GPU 2D 3D CPU GPU GPU GPGPU GPGPU 2 nvidia GPU CUDA 3 GPU 3.1 GPU Core 1

GPU GPU CPU CPU CPU GPU GPU N N CPU ( ) 1 GPU CPU GPU 2D 3D CPU GPU GPU GPGPU GPGPU 2 nvidia GPU CUDA 3 GPU 3.1 GPU Core 1 GPU 4 2010 8 28 1 GPU CPU CPU CPU GPU GPU N N CPU ( ) 1 GPU CPU GPU 2D 3D CPU GPU GPU GPGPU GPGPU 2 nvidia GPU CUDA 3 GPU 3.1 GPU Core 1 Register & Shared Memory ( ) CPU CPU(Intel Core i7 965) GPU(Tesla

More information

Vol.214-HPC-145 No /7/3 C #pragma acc directive-name [clause [[,] clause] ] new-line structured block Fortran!$acc directive-name [clause [[,] c

Vol.214-HPC-145 No /7/3 C #pragma acc directive-name [clause [[,] clause] ] new-line structured block Fortran!$acc directive-name [clause [[,] c Vol.214-HPC-145 No.45 214/7/3 OpenACC 1 3,1,2 1,2 GPU CUDA OpenCL OpenACC OpenACC High-level OpenACC CPU Intex Xeon Phi K2X GPU Intel Xeon Phi 27% K2X GPU 24% 1. TSUBAME2.5 CPU GPU CUDA OpenCL CPU OpenMP

More information

IPSJ SIG Technical Report Vol.2013-ARC-203 No /2/1 SMYLE OpenCL (NEDO) IT FPGA SMYLEref SMYLE OpenCL SMYLE OpenCL FPGA 1

IPSJ SIG Technical Report Vol.2013-ARC-203 No /2/1 SMYLE OpenCL (NEDO) IT FPGA SMYLEref SMYLE OpenCL SMYLE OpenCL FPGA 1 SMYLE OpenCL 128 1 1 1 1 1 2 2 3 3 3 (NEDO) IT FPGA SMYLEref SMYLE OpenCL SMYLE OpenCL FPGA 128 SMYLEref SMYLE OpenCL SMYLE OpenCL Implementation and Evaluations on 128 Cores Takuji Hieda 1 Noriko Etani

More information

IPSJ SIG Technical Report Vol.2011-IOT-12 No /3/ , 6 Construction and Operation of Large Scale Web Contents Distribution Platfo

IPSJ SIG Technical Report Vol.2011-IOT-12 No /3/ , 6 Construction and Operation of Large Scale Web Contents Distribution Platfo 1 1 2 3 4 5 1 1, 6 Construction and Operation of Large Scale Web Contents Distribution Platform using Cloud Computing 1. ( ) 1 IT Web Yoshihiro Okamoto, 1 Naomi Terada and Tomohisa Akafuji, 1, 2 Yuko Okamoto,

More information

MATLAB® における並列・分散コンピューティング ~ Parallel Computing Toolbox™ & MATLAB Distributed Computing Server™ ~

MATLAB® における並列・分散コンピューティング ~ Parallel Computing Toolbox™ & MATLAB Distributed Computing Server™ ~ MATLAB における並列 分散コンピューティング ~ Parallel Computing Toolbox & MATLAB Distributed Computing Server ~ MathWorks Japan Application Engineering Group Takashi Yoshida 2016 The MathWorks, Inc. 1 System Configuration

More information

2). 3) 4) 1.2 NICTNICT DCRA Dihedral Corner Reflector micro-arraysdcra DCRA DCRA DCRA 3D DCRA PC USB PC PC ON / OFF Velleman K8055 K8055 K8055

2). 3) 4) 1.2 NICTNICT DCRA Dihedral Corner Reflector micro-arraysdcra DCRA DCRA DCRA 3D DCRA PC USB PC PC ON / OFF Velleman K8055 K8055 K8055 1 1 1 2 DCRA 1. 1.1 1) 1 Tactile Interface with Air Jets for Floating Images Aya Higuchi, 1 Nomin, 1 Sandor Markon 1 and Satoshi Maekawa 2 The new optical device DCRA can display floating images in free

More information

IPSJ SIG Technical Report Vol.2013-HPC-138 No /2/21 GPU CRS 1,a) 2,b) SpMV GPU CRS SpMV GPU NVIDIA Kepler CUDA5.0 Fermi GPU Kepler Kepler Tesla

IPSJ SIG Technical Report Vol.2013-HPC-138 No /2/21 GPU CRS 1,a) 2,b) SpMV GPU CRS SpMV GPU NVIDIA Kepler CUDA5.0 Fermi GPU Kepler Kepler Tesla GPU CRS 1,a),b) SpMV GPU CRS SpMV GPU NVIDIA Kepler CUDA5.0 Fermi GPU Kepler Kepler Tesla K0 CUDA5.0 cusparse CRS SpMV 00 1.86 177 1. SpMV SpMV CRS Compressed Row Storage *1 SpMV GPU GPU NVIDIA Kepler

More information

HPC可視化_小野2.pptx

HPC可視化_小野2.pptx 大 小 二 生 高 方 目 大 方 方 方 Rank Site Processors RMax Processor System Model 1 DOE/NNSA/LANL 122400 1026000 PowerXCell 8i BladeCenter QS22 Cluster 2 DOE/NNSA/LLNL 212992 478200 PowerPC 440 BlueGene/L 3 Argonne

More information

HP xw9400 Workstation

HP xw9400 Workstation HP xw9400 Workstation HP xw9400 Workstation AMD Opteron TM PCI Express x16 64 PCI Express x16 2 USB2.0 8 IEEE1394 2 8DIMM HP HP xw9400 Workstation HP CPU HP CPU 240W CPU HP xw9400 HP CPU CPU CPU CPU Sound

More information

1 2 4 5 9 10 12 3 6 11 13 14 0 8 7 15 Iteration 0 Iteration 1 1 Iteration 2 Iteration 3 N N N! N 1 MOPT(Merge Optimization) 3) MOPT 8192 2 16384 5 MOP

1 2 4 5 9 10 12 3 6 11 13 14 0 8 7 15 Iteration 0 Iteration 1 1 Iteration 2 Iteration 3 N N N! N 1 MOPT(Merge Optimization) 3) MOPT 8192 2 16384 5 MOP 10000 SFMOPT / / MOPT(Merge OPTimization) MOPT FMOPT(Fast MOPT) FMOPT SFMOPT(Subgrouping FMOPT) SFMOPT 2 8192 31 The Proposal and Evaluation of SFMOPT, a Task Mapping Method for 10000 Tasks Haruka Asano

More information

IPSJ SIG Technical Report Vol.2010-GN-74 No /1/ , 3 Disaster Training Supporting System Based on Electronic Triage HIROAKI KOJIMA, 1 KU

IPSJ SIG Technical Report Vol.2010-GN-74 No /1/ , 3 Disaster Training Supporting System Based on Electronic Triage HIROAKI KOJIMA, 1 KU 1 2 2 1, 3 Disaster Training Supporting System Based on Electronic Triage HIROAKI KOJIMA, 1 KUNIAKI SUSEKI, 2 KENTARO NAGAHASHI 2 and KEN-ICHI OKADA 1, 3 When there are a lot of injured people at a large-scale

More information

Second-semi.PDF

Second-semi.PDF PC 2000 2 18 2 HPC Agenda PC Linux OS UNIX OS Linux Linux OS HPC 1 1CPU CPU Beowulf PC (PC) PC CPU(Pentium ) Beowulf: NASA Tomas Sterling Donald Becker 2 (PC ) Beowulf PC!! Linux Cluster (1) Level 1:

More information

258 5) GPS 1 GPS 6) GPS DP 7) 8) 10) GPS GPS 2 3 4 5 2. 2.1 3 1) GPS Global Positioning System

258 5) GPS 1 GPS 6) GPS DP 7) 8) 10) GPS GPS 2 3 4 5 2. 2.1 3 1) GPS Global Positioning System Vol. 52 No. 1 257 268 (Jan. 2011) 1 2, 1 1 measurement. In this paper, a dynamic road map making system is proposed. The proposition system uses probe-cars which has an in-vehicle camera and a GPS receiver.

More information

001.dvi

001.dvi THE SCIENCE AND ENGINEERING DOSHISHA UNIVERSITY, VOL.XX, NO.Y NOVEMBER 2003 Construction of Tera Flops PC Cluster System and evaluation of performance using Benchmark Tomoyuki HIROYASU * Mitsunori MIKI

More information

Microsoft Word - 0_0_表紙.doc

Microsoft Word - 0_0_表紙.doc 2km Local Forecast Model; LFM Local Analysis; LA 2010 11 2.1.1 2010a LFM 2.1.1 2011 3 11 2.1.1 2011 5 2010 6 1 8 3 1 LFM LFM MSM LFM FT=2 2009; 2010 MSM RMSE RMSE MSM RMSE 2010 1 8 3 2010 6 2010 6 8 2010

More information

HP High Performance Computing(HPC)

HP High Performance Computing(HPC) HP High Performance Computing HPC HPC HPC HPC HPC 1000 HPHPC HPC HP HPC HPC HPCHP HPC HP HPHPC HPC HP HPC HP IT IDCHP HPC 4 1 HPC HPCNo.1 HPCTOP5002008 6 HP 183 37% HP HPCHP B 1 Other 2Q08 HPC 2 20% 27%

More information

A Feasibility Study of Direct-Mapping-Type Parallel Processing Method to Solve Linear Equations in Load Flow Calculations Hiroaki Inayoshi, Non-member

A Feasibility Study of Direct-Mapping-Type Parallel Processing Method to Solve Linear Equations in Load Flow Calculations Hiroaki Inayoshi, Non-member A Feasibility Study of Direct-Mapping-Type Parallel Processing Method to Solve Linear Equations in Load Flow Calculations Hiroaki Inayoshi, Non-member (University of Tsukuba), Yasuharu Ohsawa, Member (Kobe

More information

6 2. AUTOSAR 2.1 AUTOSAR AUTOSAR ECU OSEK/VDX 3) OSEK/VDX OS AUTOSAR AUTOSAR ECU AUTOSAR 1 AUTOSAR BSW (Basic Software) (Runtime Environment) Applicat

6 2. AUTOSAR 2.1 AUTOSAR AUTOSAR ECU OSEK/VDX 3) OSEK/VDX OS AUTOSAR AUTOSAR ECU AUTOSAR 1 AUTOSAR BSW (Basic Software) (Runtime Environment) Applicat AUTOSAR 1 1, 2 2 2 AUTOSAR AUTOSAR 3 2 2 41% 29% An Extension of AUTOSAR Communication Layers for Multicore Systems Toshiyuki Ichiba, 1 Hiroaki Takada, 1, 2 Shinya Honda 2 and Ryo Kurachi 2 AUTOSAR, a

More information

スライド 1

スライド 1 GPU クラスタによる格子 QCD 計算 広大理尾崎裕介 石川健一 1.1 Introduction Graphic Processing Units 1 チップに数百個の演算器 多数の演算器による並列計算 ~TFLOPS ( 単精度 ) CPU 数十 GFLOPS バンド幅 ~100GB/s コストパフォーマンス ~$400 GPU の開発環境 NVIDIA CUDA http://www.nvidia.co.jp/object/cuda_home_new_jp.html

More information

チューニング講習会 初級編

チューニング講習会 初級編 GPU のしくみ RICC での使い方 およびベンチマーク 理化学研究所情報基盤センター 2013/6/27 17:00 17:30 中田真秀 RICC の GPU が高速に! ( 旧 C1060 比約 6.6 倍高速 ) RICCのGPUがC2075になりました! C1060 比 6.6 倍高速 倍精度 515GFlops UPCに100 枚導入 : 合計 51.5TFlops うまく行くと5 倍程度高速化

More information

システムソリューションのご紹介

システムソリューションのご紹介 HP 2 C 製品 :VXPRO/VXSMP サーバ 製品アップデート 製品アップデート VXPRO と VXSMP での製品オプションの追加 8 ポート InfiniBand スイッチ Netlist HyperCloud メモリ VXPRO R2284 GPU サーバ 製品アップデート 8 ポート InfiniBand スイッチ IS5022 8 ポート 40G InfiniBand スイッチ

More information

スパコンに通じる並列プログラミングの基礎

スパコンに通じる並列プログラミングの基礎 2016.06.06 2016.06.06 1 / 60 2016.06.06 2 / 60 Windows, Mac Unix 0444-J 2016.06.06 3 / 60 Part I Unix GUI CUI: Unix, Windows, Mac OS Part II 0444-J 2016.06.06 4 / 60 ( : ) 6 6 ( ) 6 10 6 16 SX-ACE 6 17

More information

Microsoft PowerPoint - endo-hokke13-kfc.pptx

Microsoft PowerPoint - endo-hokke13-kfc.pptx TSUBAME-KFC: 液 浸 冷 却 を 用 いた ウルトラグリーンスパコン 研 究 設 備 遠 藤 敏 夫 額 田 彰 松 岡 聡 東 京 工 業 大 学 学 術 国 際 情 報 センター 現 在 ~ 将 来 のスパコンは 電 力 あ たり 性 能 で 決 まる 現 実 的 なスパコンセンターの 電 力 の 限 界 は20MW 程 度 とされる Exaflopsのシステムを 実 現 する には

More information

HPE Moonshot System ~ビッグデータ分析&モバイルワークプレイスを新たなステージへ~

HPE Moonshot System ~ビッグデータ分析&モバイルワークプレイスを新たなステージへ~ Brochure HPE Moonshot System HPE Moonshot System 4.3U 45 HPE Moonshot System Xeon & HPE Moonshot System HPE Moonshot System HPE HPE Moonshot System &IoT & SoC Xeon D-1500 Broadwell-DE HPE ProLiant m510

More information

rank ”«‘‚“™z‡Ì GPU ‡É‡æ‡éŁÀŠñ›»

rank ”«‘‚“™z‡Ì GPU ‡É‡æ‡éŁÀŠñ›» rank GPU ERATO 2011 11 1 1 / 26 GPU rank/select wavelet tree balanced parenthesis GPU rank 2 / 26 GPU rank/select wavelet tree balanced parenthesis GPU rank 2 / 26 GPU rank/select wavelet tree balanced

More information

2017 (413812)

2017 (413812) 2017 (413812) Deep Learning ( NN) 2012 Google ASIC(Application Specific Integrated Circuit: IC) 10 ASIC Deep Learning TPU(Tensor Processing Unit) NN 12 20 30 Abstract Multi-layered neural network(nn) has

More information

Slides: TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments

Slides: TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments 計算機アーキテクチャ第 11 回 マルチプロセッサ 本資料は授業用です 無断で転載することを禁じます 名古屋大学 大学院情報科学研究科 准教授加藤真平 デスクトップ ジョブレベル並列性 スーパーコンピュータ 並列処理プログラム プログラムの並列化 for (i = 0; i < N; i++) { x[i] = a[i] + b[i]; } プログラムの並列化 x[0] = a[0] + b[0];

More information

ProLiant BL460c システム構成図

ProLiant BL460c システム構成図 HP BladeSystem c-class Server HP 2008 5 26 BLADE3.0 Web http://www.hp.com/jp/blade_fill/ 1 OVERVIEW HP 1 2 2.5 SAS H Xeon ( 2 ) (SFF)( 2 ) I/O PC2-5300 FB-DIMM DDR2-667 8 Smart E200i (Type Type 1 ) USB

More information

スパコンに通じる並列プログラミングの基礎

スパコンに通じる並列プログラミングの基礎 2018.09.10 [email protected] ( ) 2018.09.10 1 / 59 [email protected] ( ) 2018.09.10 2 / 59 Windows, Mac Unix 0444-J [email protected] ( ) 2018.09.10 3 / 59 Part I Unix GUI CUI:

More information

( )

( ) 1. 2. 3. 4. 5. ( ) () http://www-astro.physics.ox.ac.uk/~wjs/apm_grey.gif http://antwrp.gsfc.nasa.gov/apod/ap950917.html ( ) SDSS : d 2 r i dt 2 = Gm jr ij j i rij 3 = Newton 3 0.1% 19 20 20 2 ( ) 3 3

More information

HASC2012corpus HASC Challenge 2010,2011 HASC2011corpus( 116, 4898), HASC2012corpus( 136, 7668) HASC2012corpus HASC2012corpus

HASC2012corpus HASC Challenge 2010,2011 HASC2011corpus( 116, 4898), HASC2012corpus( 136, 7668) HASC2012corpus HASC2012corpus HASC2012corpus 1 1 1 1 1 1 2 2 3 4 5 6 7 HASC Challenge 2010,2011 HASC2011corpus( 116, 4898), HASC2012corpus( 136, 7668) HASC2012corpus HASC2012corpus: Human Activity Corpus and Its Application Nobuo KAWAGUCHI,

More information

iphone GPGPU GPU OpenCL Mac OS X Snow LeopardOpenCL iphone OpenCL OpenCL NVIDIA GPU CUDA GPU GPU GPU 15 GPU GPU CPU GPU iii OpenMP MPI CPU OpenCL CUDA OpenCL CPU OpenCL GPU NVIDIA Fermi GPU Fermi GPU GPU

More information

HP Workstation 総合カタログ

HP Workstation 総合カタログ HP Workstation 4 No. HP Z HP Z820 Workstation HP Z620 Workstation HP Z420 Workstation & CPU P.10 P.11 P.11 14.0in WIDE 15.6in WIDE 17.3in WIDE HP EliteBook 8470w Mobile Workstation HP EliteBook 8570w Mobile

More information

スパコンに通じる並列プログラミングの基礎

スパコンに通じる並列プログラミングの基礎 2018.06.04 2018.06.04 1 / 62 2018.06.04 2 / 62 Windows, Mac Unix 0444-J 2018.06.04 3 / 62 Part I Unix GUI CUI: Unix, Windows, Mac OS Part II 2018.06.04 4 / 62 0444-J ( : ) 6 4 ( ) 6 5 * 6 19 SX-ACE * 6

More information

mobicom.dvi

mobicom.dvi 13Dynamic Voltage Scaling on a Low-Power Microprocessor Johan Pouwelse 5 Koen Langendoen Henk Sips Faculty of Information Technology and Systems Delft University of Technology, The Netherlands 1 78724

More information

12 PowerEdge PowerEdge Xeon E PowerEdge 11 PowerEdge DIMM Xeon E PowerEdge DIMM DIMM 756GB 12 PowerEdge Xeon E5-

12 PowerEdge PowerEdge Xeon E PowerEdge 11 PowerEdge DIMM Xeon E PowerEdge DIMM DIMM 756GB 12 PowerEdge Xeon E5- 12ways-12th Generation PowerEdge Servers improve your IT experience 12 PowerEdge 12 1 6 2 GPU 8 4 PERC RAID I/O Cachecade I/O 5 Dell Express Flash PCIe SSD 6 7 OS 8 85.5% 9 Dell OpenManage PowerCenter

More information

EGunGPU

EGunGPU Super Computing in Accelerator simulations - Electron Gun simulation using GPGPU - K. Ohmi, KEK-Accel Accelerator Physics seminar 2009.11.19 Super computers in KEK HITACHI SR11000 POWER5 16 24GB 16 134GFlops,

More information