AMD/ATI Radeon HD 5870 GPU DEGIMA LINPACK HD 5870 GPU DEGIMA LINPACK GFlops/Watt GFlops/Watt Abstract GPU Computing has lately attracted

Similar documents
1 GPU GPGPU GPU CPU 2 GPU 2007 NVIDIA GPGPU CUDA[3] GPGPU CUDA GPGPU CUDA GPGPU GPU GPU GPU Graphics Processing Unit LSI LSI CPU ( ) DRAM GPU LSI GPU

supercomputer2010.ppt

23 Fig. 2: hwmodulev2 3. Reconfigurable HPC 3.1 hw/sw hw/sw hw/sw FPGA PC FPGA PC FPGA HPC FPGA FPGA hw/sw hw/sw hw- Module FPGA hwmodule hw/sw FPGA h

GPGPU

07-二村幸孝・出口大輔.indd

P2P P2P peer peer P2P peer P2P peer P2P i

IPSJ SIG Technical Report Vol.2013-ARC-203 No /2/1 SMYLE OpenCL (NEDO) IT FPGA SMYLEref SMYLE OpenCL SMYLE OpenCL FPGA 1

GPU n Graphics Processing Unit CG CAD

untitled

IPSJ SIG Technical Report Vol.2014-ARC-213 No.24 Vol.2014-HPC-147 No /12/10 GPU 1,a) 1,b) 1,c) 1,d) GPU GPU Structure Of Array Array Of

IPSJ SIG Technical Report Vol.2017-ARC-225 No.12 Vol.2017-SLDM-179 No.12 Vol.2017-EMB-44 No /3/9 1 1 RTOS DefensiveZone DefensiveZone MPU RTOS

Microsoft PowerPoint - GPU_computing_2013_01.pptx

EGunGPU

HP High Performance Computing(HPC)

1 Table 1: Identification by color of voxel Voxel Mode of expression Nothing Other 1 Orange 2 Blue 3 Yellow 4 SSL Humanoid SSL-Vision 3 3 [, 21] 8 325

Slides: TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments

untitled

倍々精度RgemmのnVidia C2050上への実装と応用

Table 1. Reluctance equalization design. Fig. 2. Voltage vector of LSynRM. Fig. 4. Analytical model. Table 2. Specifications of analytical models. Fig

untitled

09中西

CPU Levels in the memory hierarchy Level 1 Level 2... Increasing distance from the CPU in access time Level n Size of the memory at each level 1: 2.2

untitled

HPEハイパフォーマンスコンピューティング ソリューション

mobicom.dvi

GPUコンピューティング講習会パート1

スライド 1

[4] ACP (Advanced Communication Primitives) [1] ACP ACP [2] ACP Tofu UDP [3] HPC InfiniBand InfiniBand ACP 2 ACP, 3 InfiniBand ACP 4 5 ACP 2. ACP ACP

& Vol.5 No (Oct. 2015) TV 1,2,a) , Augmented TV TV AR Augmented Reality 3DCG TV Estimation of TV Screen Position and Ro

main.dvi

1 3DCG [2] 3DCG CG 3DCG [3] 3DCG 3 3 API 2 3DCG 3 (1) Saito [4] (a) 1920x1080 (b) 1280x720 (c) 640x360 (d) 320x G-Buffer Decaudin[5] G-Buffer D

VXPRO R1400® ご提案資料

スパコンに通じる並列プログラミングの基礎

2017 (413812)

スパコンに通じる並列プログラミングの基礎

1 OpenCL OpenCL 1 OpenCL GPU ( ) 1 OpenCL Compute Units Elements OpenCL OpenCL SPMD (Single-Program, Multiple-Data) SPMD OpenCL work-item work-group N

スーパーコンピュータ「京」の概要


A Study on Practical Use of Artificial Intelligence. The purpose of this research paper is to demonstrate the ease of using artificial intelligence in

スパコンに通じる並列プログラミングの基礎

7,, i

258 5) GPS 1 GPS 6) GPS DP 7) 8) 10) GPS GPS ) GPS Global Positioning System

! 行行 CPUDSP PPESPECell/B.E. CPUGPU 行行 SIMD [SSE, AltiVec] 用 HPC CPUDSP PPESPE (Cell/B.E.) SPE CPUGPU GPU CPU DSP DSP PPE SPE SPE CPU DSP SPE 2

システムソリューションのご紹介

Vol.57 No (Mar. 2016) 1,a) , L3 CG VDI VDI A Migration to a Cloud-based Information Infrastructure to Support

<95DB8C9288E397C389C88A E696E6462>

1重谷.PDF

Design and Implementation of Centralized Financial Management system 厦门大学博硕士论文摘要库

( ) [1] [4] ( ) 2. [5] [6] Piano Tutor[7] [1], [2], [8], [9] Radiobaton[10] Two Finger Piano[11] Coloring-in Piano[12] ism[13] MIDI MIDI 1 Fig. 1 Syst

DEIM Forum 2009 B4-6, Str

IPSJ SIG Technical Report Vol.2012-HCI-149 No /7/20 1 1,2 1 (HMD: Head Mounted Display) HMD HMD,,,, An Information Presentation Method for Weara

FINAL PROGRAM 22th Annual Workshop SWoPP / / 2009 Sendai Summer United Workshops on Parallel, Distributed, and Cooperative Processing

17 Proposal of an Algorithm of Image Extraction and Research on Improvement of a Man-machine Interface of Food Intake Measuring System

2) TA Hercules CAA 5 [6], [7] CAA BOSS [8] 2. C II C. ( 1 ) C. ( 2 ). ( 3 ) 100. ( 4 ) () HTML NFS Hercules ( )

先進的計算基盤システムシンポジウム SACSIS2012 Symposium on Advanced Computing Systems and Infrastructures SACSIS /5/18 CPU, CPU., Memory-bound CPU,., Memory-bo

GT-X830

strtok-count.eps

indd


GPUコンピューティング講習会パート1

揃 Lag [hour] Lag [day] 35

HP Workstation 総合カタログ

FAX-760CLT

Table 1. Assumed performance of a water electrol ysis plant. Fig. 1. Structure of a proposed power generation system utilizing waste heat from factori

Development and Field Test of a Portable Camera System for Long Term Observation of Natural Dam Ken AKIYAMA (Tohoku Univ.), Genki YAMAUCHI (Tohoku Uni

kut-paper-template.dvi

Second-semi.PDF

1_26.dvi

[2] 2. [3 5] 3D [6 8] Morishima [9] N n 24 24FPS k k = 1, 2,..., N i i = 1, 2,..., n Algorithm 1 N io user-specified number of inbetween omis

1 Web [2] Web [3] [4] [5], [6] [7] [8] S.W. [9] 3. MeetingShelf Web MeetingShelf MeetingShelf (1) (2) (3) (4) (5) Web MeetingShelf

HPC可視化_小野2.pptx

2). 3) 4) 1.2 NICTNICT DCRA Dihedral Corner Reflector micro-arraysdcra DCRA DCRA DCRA 3D DCRA PC USB PC PC ON / OFF Velleman K8055 K8055 K8055

卒業論文2.dvi

SC SC10 (International Conference for High Performance Computing, Networking, Storage and Analysis) (HPC) Ernest N.

HPC (pay-as-you-go) HPC Web 2

28 Docker Design and Implementation of Program Evaluation System Using Docker Virtualized Environment

1: A/B/C/D Fig. 1 Modeling Based on Difference in Agitation Method artisoc[7] A D 2017 Information Processing

HP Workstation Xeon 5600

211 年ハイパフォーマンスコンピューティングと計算科学シンポジウム Computing Symposium 211 HPCS /1/18 a a 1 a 2 a 3 a a GPU Graphics Processing Unit GPU CPU GPU GPGPU G

main.dvi

6 2. AUTOSAR 2.1 AUTOSAR AUTOSAR ECU OSEK/VDX 3) OSEK/VDX OS AUTOSAR AUTOSAR ECU AUTOSAR 1 AUTOSAR BSW (Basic Software) (Runtime Environment) Applicat

untitled

修士論文

JIS SI ppm JIS JIS JIS JIS Z 6005 SI System International d Unites JIS JIS J

IPSJ SIG Technical Report Vol.2014-CG-155 No /6/28 1,a) 1,2,3 1 3,4 CG An Interpolation Method of Different Flow Fields using Polar Inter

IPSJ SIG Technical Report Vol.2012-CG-148 No /8/29 3DCG 1,a) On rigid body animation taking into account the 3D computer graphics came

はじめに

AMD AMD AMD Opteron x86 OS 2P 8P x GHz 75W ACP OEM Q4 2.3GHz HE (55W) 2.8GHz SE (105W) AMD PC 2009 All rights reserved. AMD Japan, L

MATLAB® における並列・分散コンピューティング ~ Parallel Computing Toolbox™ & MATLAB Distributed Computing Server™ ~

HP Blade Workstation HP RCS Remote Client Solution HP Blade Workstation CO2 2

TOOLS for UR44 Release Notes for Windows

1 2

ActionScript Flash Player 8 ActionScript3.0 ActionScript Flash Video ActionScript.swf swf FlashPlayer AVM(Actionscript Virtual Machine) Windows

EPSON ES-D200 パソコンでのスキャンガイド

1 [1, 2, 3, 4, 5, 8, 9, 10, 12, 15] The Boston Public Schools system, BPS (Deferred Acceptance system, DA) (Top Trading Cycles system, TTC) cf. [13] [

MIDI_IO.book

DPA,, ShareLog 3) 4) 2.2 Strino Strino STRain-based user Interface with tacticle of elastic Natural ObjectsStrino 1 Strino ) PC Log-Log (2007 6)

fiš„v8.dvi

Vol.54 No (Mar. 2013) 1,a) , A Case Study of the Publication of Information on the Japan Earthquake Naoto Matsumoto 1,a

Transcription:

DEGIMA LINPACK Energy Performance for LINPACK Benchmark on DEGIMA 1

AMD/ATI Radeon HD 5870 GPU DEGIMA LINPACK HD 5870 GPU DEGIMA LINPACK 1.4698 GFlops/Watt 1.9658 GFlops/Watt Abstract GPU Computing has lately attracted for energy efficiency. Most of GPU computing system are using for coarse-grained optimization for power-consumption and not for energy efficiency. In this paper, we propose an fine-grained optimization method for enegy efficient GPU computing. We use AMD/ATI Radeon HD 5870 GPU system and introduce its power consumption model in relation between energy-efficiency(flops/watt) and system parameters such as GPU frecuency and voltage. We implement an enegy controllable library with our power consumption model and apply it to the LINPACK benchmark. We found that the energy efficiency improved from 1.47 GFlops/Watt to 1.9658 GFlops/Watt using our method for LINPACK banchmark. 2

1. High Performance Computing GPU High Performance Computing(HPC) TOP500 2 TOP500 TOP500 LIN- PACK 1) 1.1 HPC 2011 11 TOP500 30 2) 2 GPU DEGIMA(DEstination for GPU Intensive MAchine) LINPACK benchmark (Rmax) (Rpeak) 1 1 DEGIMA 1 DEGIMA GPU, GPU 1 TOP500 DEGIMA 1 K computer, TSUBAME2, T2K-tsukuba R max, R peak 3

1.2 TOP500 GPU GPU TOP500 GPU 2011 11 TOP500 39 ( 2 AMD GPU 2 Cell Nvidia GPU 3) ) 3 GPU GPU DEGIMA 1 TOP500 GPU 1.3 Green500 Green500 TOP500 (Flops/W) TOP500 6 2 2011 11 50 4) 60% GPU GPU Green500 GPU 2500 MFlops/W Blue Gene/Q(IBM) GPU Cluster 2000 1500 DEGIMA(Nagasaki Univ) TSUBAME(Tokyo Tec) K Computer(RIKEN) 500 2 0 0 5 10 15 20 25 30 35 40 45 50 Rank 2011 11 Green500 50 ( ) ( ) GPU ( ) ( ) 4

1.4 2011 11 K-computer LINPACK benchmark 10 PFlops( ) 100 1 100 3 TOP500 2019 a 1EFlops 100PFlops 10PFlops 1PFlops 100TFlops 10TFlops 1TFlops 100GFlops 10GFlops 1993 1995 1997 1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 2019 3 TOP500 TOP500#1 R max 2019 5) a TOP500 1 K-computer LINPACK 11.28PFlops 12.66 MW Flops/W K- computer 100 1266 MW 1 2.75 6) 4 Green500 4 5.67 2019 32 39.38 MW (= 12.66 MW 100/32.149 5

K-computer 3 Flops/W 4 10 3 10 2 10 2007 2008 2009 2010 2011 2012 4 Green500 No.1 year 2. AMD/ATI Radeon 5870 DEGIMA AMD/ATI Radeon 5870 GPU 2.1 AMD/ATI Radeon 5870 AMD/ATI Radeon 5870 1 7) Engine clock speed MHz Processing power( ) 2.72 TFlops Processing power( ) 544 GFlops Memory clock speed 1.2GHz Memory bandwidth 153.6 GHz 1 ATI Radeon 5870 2 AMD/ATI Radeon 5870 GPU Tesla M2090 Radeon5870 17 Radeon 5870 Tesla M2090 14 6

Radeon 5870 Tesla C2070 Tesla M2090 GTX 580 Process Technology 40(nm) 40(nm) 40(nm) 40(nm) ( ) 2.72TFlops 1.03TFlops 1.33TFlops 1.581TFlops ( ) 544GFlops 515GFlops 665GFlops 198GFlops 188W 215W 225W 244W (2011 12 ) 25983 212746 452025 39580 ( ) 20.94MFlops/ 2.42MFlops/ 1.45MFlops/ 5.00MFlops/ ( ) 2.894GFlops/W 2.395GFlops/W 2.956GFlops/W 0.81GFlops/W 2 GPU. (TDP: Thermal Design Power) 2011 12 8)9)10)11)12)13)14) 2.2 AMD GPU 3 AMD/ATI GPU 3 GPU 3 9 3 3 3 9 AMD Display Library(ADL) AMD Display Library(ADL) AMD/ATI GPU C 15) AMD Radeon 5870 ADL GPU 80 MHz 1200 MHz 5 MHz GPU 150 MHz 1400 MHz 5 MHz GPU 1.062 V 1.212 V 5 mv 3 GPU 2.3 AMD GPU AMD Radeon 5870 GPU 7

Level Engine frequency(mhz) Memory frequency(mhz) Core voltage(v) 2 1200 1.212 1 600 1.112 0 157 1.062 3 ATI Radeon 5870 3 0 1 2 GPU 5 LINPACK Active Percent GPU ADL GPU 5 LINPACK (25 ) 100 80 Activity Percent Current lvl 60 40 temp 20 0 GPU Call 5 0 5 10 15 20 25 (sec) LINPACK Radeon 5870 GPU GPU (Active Percent) (temp) (Current lvl) LINPACK GPU (GPU Call) (25 ) 5 LINPACK GPU 25 5 GPU (Current lvl) GPU (temp) GPU (Active Percent) GPU (GPU Call) LINPACK GPU 8

GPU CPU 5 GPU (GPU Call) ON/OFF 5 GPU (temp) GPU Call GPU Call GPU GPU Call GPU 5 GPU Call ( GPU Call 100 ) GPU- GPU (Current lvl) 5 LINPACK 25 2 LINPACK 0 2 LINPACK 2 0 AMD GPU 6 LINPACK ADL Radeon 5870 2 ADL 6 37.4% LINPACK 21.4% 27.7% 5 7 7 18.4% 3. AMD/ATI GPU AMD/ATI GPU 9

308.54 242.55 default this work 200 watt 152.12 100 95.25 59.65 110.04 6 0 idle high low Radeon 5870 GPU (idle) LINPACK (high) LINPACK (low) default 200 Watt 100 change parameter 7 0 0 5 10 15 20 25 sec 5 ( ) ( ) 25 3.1 AMD/ATI GPU API API GPU 2 10

GPU Call API GPU 3.2 API API (1) (2) 2 8 2 API C EL SetHighestAutomatic API EL SetLowestAutomatic API EL DEVICE API Radeon 5870 EL Init API 8 int main() { EL_DEVICE dev = EL_Init(EN_DEVICE_TYPE_HD5870); // EnergyLibrary Initialization... host part.. EL_SetHighestAutomatic(dev); // EnergyLibrary API... GPU part.. EL_SetLowestAutomatic(dev); // EnergyLibrary API... host part.. } API API C API GPU 2 EL SetLowestAutomatic GPU GPU EL SetHighestAutomatic GPU 3 11

4. LINPACK API GPU GPU DGEMM LINPACK 2 DGEMM(Double-precision General Matrix Multiply) LINPACK 4.1 9 AMD Radeon 5870 GPU AC105V Digital Multimeter 500Wmax Power Unit log recorder(pc) DC 3.3~12V Host computer CPU: Intel Core i5-2500t 16GB DDR3-1600 GPU: AMD HD5870 9 4.2 DGEMM DGEMM GPU DGEMM N=42000 GPU 1.062V 1.212(V ) V=1.062(V) V=1.137(V) V=1.212(V) 4.2.1 GPU 12

10 MHz MHz MHz 10 GPU DGEMM:N=M=42000, V=1.062(V) GFlops 460 440 420 440 420 400 380 360 340 320 720 740 760 780 820 840 720 740 760 780 820 840 DGEMM:N=M=42000, V=1.137(V) 400 380 360 340 320 280 GFlops 460 440 420 440 420 400 380 360 340 320 720 740 760 780 820 840 720 740 760 780 820 840 DGEMM:N=M=42000, V=1.212(V) 400 380 360 340 320 280 GFlops 460 420 400 440 380 360 440 420 400 380 360 340 10 340 320 320 280 720 740 760 780 820 840 720 740 760 780 820 840 DGEMM DGEMM N=42000 GPU ( ) ( ) GPU ( V=1.062(V) V=1.137(V) V=1.212(V)) (GFlops) ( ) ( ) 4.2.2 GPU 11 GPU GPU 4.2.3 13

DGEMM:N=M=42000, V=1.062(V) Watt 185 190 180 185 175 180 175 170 170 165 165 720 740 760 780 820 840 720 740 760 780 820 840 DGEMM:N=M=42000, V=1.137(V) Watt 160 205 200 200 190 195 195 190 185 185 180 180 175 720 740 760 780 820 840 720 740 760 780 820 840 DGEMM:N=M=42000, V=1.212(V) Watt 170 215 210 210 200 205 205 200 195 195 190 190 185 11 185 180 720 740 760 780 820 840 720 740 760 780 820 840 DGEMM DGEMM N=42000 GPU ( ) ( ) GPU ( V=1.062(V) V=1.137(V) V=1.212(V)) (Watt) ( ) ( ) 12 GPU GPU GPU 770MHz MHz 1.062V 4.3 LINPACK LINPACK GPU LINPACK N=39680 NB=1280 GPU V=1.062(V) 4.3.1 GPU 14

720 740 760 780 820 840 情報処理学会研究報告 DGEMM:N=M=42000, V=1.062(V) GFlops/W 2.6 2.5 2.5 2.4 2.3 2.4 2.2 2.1 2.0 2.3 2.2 2.1 2 1.9 1.8 1.9 1.8 1.7 720 740 760 780 820 840 DGEMM:N=M=42000, V=1.137(V) GFlops/W 2.4 2.3 2.2 2.3 2.2 2.1 2.0 1.9 2.1 2 1.9 1.8 1.7 1.8 1.7 1.6 720 740 760 780 820 840 720 740 760 780 820 840 DGEMM:N=M=42000, V=1.212(V) GFlops/W 2.3 2.2 2.2 2.1 2.0 1.9 1.8 2.1 2 1.9 1.8 1.7 1.7 1.6 1.6 12 1.5 720 740 760 780 820 840 720 740 760 780 820 840 DGEMM DGEMM N=42000 GPU ( ) ( ) GPU ( V=1.062(V) V=1.137(V) V=1.212(V)) (GFlops/W) ( ) ( ) 13 DGEMM GPU DGEMM LINPACK GPU 4.3.2 GPU 14 DGEMM GPU DGEMM LINPACK GPU 4.3.3 15 DGEMM 15

LINPACK:N=39680, NB=1280 GFlops 320 310 280 290 280 270 260 260 250 240 240 230 13 220 720 740 760 780 820 720 740 760 780 820 LINPACK LINPACK N=39680 NB=1280 GPU V=1.062(V) GPU ( ) ( ) (GFlops) ( ) ( ) LINPACK:N=39680, NB=1280 Watt 190 180 185 160 170 180 175 170 165 160 14 155 720 740 760 780 820 720 740 760 780 820 LINPACK LINPACK N=39680 NB=1280 GPU V=1.062(V) GPU ( ) ( ) (Watt) ( ) ( ) GPU GPU DGEMM LINPACK GPU 770MHz MHz 1.062V 5. DGEMM 16

LINPACK:N=39680, NB=1280 GFlops/W 2 1.9 1.9 1.8 1.7 1.6 1.8 1.7 1.6 1.5 1.5 1.4 1.4 15 1.3 720 740 760 780 820 720 740 760 780 820 LINPACK LINPACK N=39680 NB=1280 GPU V=1.062(V) GPU ( ) ( ) (GFlops/W) ( ) ( ) 5.1 DGEMM ( ) DGEMM ( ) 5.2 12 1 16 765MHz 930MHz 1.062V f(f eng, f mem ) = 1.73659 10 18 f 4 eng + 1.0627 10 18 f 4 mem 4.38584 10 18 f 3 eng f mem 1.24579 10 17 f eng f 3 mem + 1.81337 10 17 f 2 eng f 2 mem + 9.99745 10 13 f e +5.95166 10 13 f 3 mem 2.1966 10 12 f 2 eng f mem + 4.48017 10 13 f eng f mem 2.96352 10 8 f 2 eng 9.68834 10 8 f 2 mem + 1.39058 10 7 f eng f mem 2.27374 10 3 f eng + 1.95304 10 3 f mem + 9.80607 10 1 (1 5.3 17 2 6 W eng W mem f V W host 17

DGEMM:N=M=42000 GFlops/W 2.6 2.5 2.4 16 2.3 2.2 2.5 2.1 2.4 2.3 2 2.2 1.9 2.1 2.0 1.9 1.8 1.8 1.7 720 740 760 780 820 840 720 740 760 780 820 840 1 DGEMM N=42000 ( 12) Host computer W in Power Unit W out GPU GPU Engine GPU Memory W host W powerunit 17 W eng W mem GPU GPU 4 5 6 10 GPU GPU 7 8 W powerunit = W in W out (2) W out = W eng + W mem + W host (3) W eng = k eng f eng V 2 (4) W mem = k mem f mem V 2 (5) W host = Const (6) S( f) = S(f eng, f mem ) (7) E = S( f) W in ( f, V ) 5.3.1 1 10 (8) 18

W_out(W) 情報処理学会研究報告 9 10 18 20 21 22 765MHz 920MHz 1.062V f(f eng, f mem ) = 2.28649 10 16 f 4 eng + 1.9017 10 16 f 4 mem 5.86035 10 16 f 3 eng f mem 2.34738 10 15 f eng f 3 mem + 3.14867 10 15 f 2 eng f 2 mem + 1.39896 10 10 f e +1.15289 10 10 f 3 mem 4.23504 10 10 f 2 eng f mem + 1.25702 10 10 f eng f m +1.64027 10 7 f 2 eng 2.02001 10 5 f 2 mem + 2.34208 10 5 f eng f mem 6.31176f eng 10 1 + 5.47227f mem 10 1 1.99416 10 1 (9 DGEMM:N=M=42000 GFlops 00 00 460 00 00 s(x,y) 440 420 00 00 00 00 440 420 400 380 360 340 320 00 00 00 00 400 380 360 340 320 00 00 00 00 72000 74000 76000 70 00 82000 84000 72000 74000 76000 70 00 82000 84000 18 (DGEMM N=42000) 280 W out = 3.60009 10 4 f eng + 4.33553 10 4 f mem + 89.9443 (10) 550 500 450 400 350 250 200 150 100 19 50 100 150 200 250 350 400 450 500 550 600 W_in(W) ( ) 16) W in W out 6. 12 DGEMM 19

180 情報処理学会研究報告 00 DGEMM:N=M=42000 00 Watt 185 w(x,y) 00 00 180 00 00 00 170 175 00 00 00 175 170 00 00 165 20 165 00 160 00 00 00 72000 74000 76000 70 00 82000 84000 72000 74000 76000 70 00 82000 84000 DGEMM N=42000 ( 11) 3 ( ) ( ) DGEMM:N=M=42000 GFlops/W 2.6 2.5 2.4 2.5 2.4 2.3 2.2 2.1 2.3 2.2 2.1 2 1.9 2.0 1.9 1.8 1.8 1.7 21 1.6 720 740 760 780 820 840 720 740 760 780 820 840 2 DGEMM N=42000 18 20 ( ) ( ) Relative Error 0.007 0.006 0.005 0.004 0.003 0.002 0.001 22 720 740 760 780 820 840 16 0 770MHz MHz 1.062V LINPACK 6.1 23 LINPACK GPU AMD/ATI Radeon5870 20

1200 Normal 600 400 200 This work 23 0 0 20 40 60 sec 80 100 120 140 AMD/ATI Radeon 5870 (Normal) (This work) GPU 6.2 AMD/ATI Radeon5870 24 LINPACK 220.972W 159.552W 27.8% 770MHz MHz 1.062V 765MHz 930MHz 1.062V 765MHz 920MHz 1.062V 4 ( 12) ( 16 21) 6.3 DGEMM 4 21

50 100 80 60 40 20 0-20 -40-60 -80 0 20 40 60 80 100 120 140 0.2 0-0.2-0.4-0.6-0.8-1 -1.2-1.4-1.6 0 20 40 60 80 100 120 140 情報処理学会研究報告 350 normal 10 9 normal 8 250 this work 7 this work W 200 Wh 6 5 150 4 100 3 2 24 w 1 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 sec wh LINPACK AMD (normal) (this work) (this work normal) sec 5 25 4 LINPACK AMD/ATI Radeon5870 LINPACK 1,4698Gflops/W 1.9658Gflops/W 33.7% LINPACK 324.8 Gflops 220.972 W 1.4698 Gflops/W 310.6 Gflops 159.552 W 1.9472 Gflops/W 309.1 Gflops 162.794 W 1.8993 Gflops/W 308.6 Gflops 157.017 W 1.9658 Gflops/W 5 LINPACK. (N=39680, NB=1280). 22

GFlops Watt GFlops/W 2 200 GFlops/W 1 100 25 0 0 default thiswork thiswork thiswork (with model1) (with model2) LINPACK. (N=39680, NB=1280). (GFlops) (Watt) (GFlops/W) 7. GPU LINPACK DGEMM DGEMM 2 3 DGEMM LINPACK 1.4698 Gflops/W 1.9658 Gflops/W 2011 6 2011 11 Green500 References 1) J.Dongarra, LINPACK: users guide, ser. Miscellaneous Bks. Society for Industrial and Applied Mathematics, 1979. [Online]. Available: http://books.google.co.jp/books?id=amsm1n3vw0cc 2) Top 500 countries share for 11/2011, 2011. [Online]. Available: http://www.top500.org/charts/list/38/countries 3) Top 500 press release, 2011. [Online]. Available: http://www.top500.org/lists/2011/11/press-release 23

4) The green500 list november 2011, 2011. [Online]. Available: http://www.green500.org/lists/2011/11/top/list.php?from=1&to=100 5) Top500 performance development, 2011. [Online]. Available: http://www.top500.org/lists/2011/06/performance development 6), 2011. [Online]. Available: http://www.green500.org/lists/2011/11/top/list.php?from=1&to=100 7) Ati radeon hd 5870 graphics specifications, 2011. [Online]. Available: http://www.amd.com/us/products/desktop/graphics/ati-radeon-hd- 5000/hd-5870/Pages/ati-radeon-hd-5870-overview.aspx#2 8) Nvidia tesla c2050 / c2070 gpu, 2011. [Online]. Available: http://www.nvidia.co.jp/object/product tesla C2050 C2070 jp.html 9) Next io vcore extreme -, 2011. [Online]. Available: http://www.elsa-jp.co.jp/products/nextio/vcore extreme/index.html 10) G.Chen, L.Chacón, and D.C. Barnes, An efficient mixed-precision, hybrid CPU-GPU implementation of a fully implicit particle-in-cell algorithm, ArXiv e-prints, Nov. 2011. 11).com eah5870/2dis/1gd5/v2 (pciexp 1gb), 2011. [Online]. Available: http://kakaku.com/item/k0000102777/ 12) Nvidia tesla c2070 [pciexp 6gb], 2011. [Online]. Available: http://kakaku.com/item/k0000264157/?lid=ksearch kakakuitem title 13) Ntt-x store, 2011. [Online]. Available: http://nttxstore.jp/ II HP13647981 14) Giada gtx580-ddr5 [pciexp 1.5gb], 2011. [Online]. Available: http://kakaku.com/item/k0000321156/?lid=ksearch kakakuitem title 15) Amd display library (adl) sdk, 2011. [Online]. Available: http://developer.amd.com/sdks/adlsdk/pages/default.aspx 16) 80 plus verification and testing report, 2011. [Online]. Available: http://www.acbel.com/productfile/80plus/acbel PC6024 W Report.pdf 24