次世代スーパーコンピュータのシステム構成案について

Similar documents
統合汎用スーパーコンピュータシステムの設計状況と施設整備状況

1重谷.PDF

卒業論文

01_OpenMP_osx.indd

supercomputer2010.ppt

untitled

スーパーコンピュータ「京」の概要

スパコンに通じる並列プログラミングの基礎

スパコンに通じる並列プログラミングの基礎

スパコンに通じる並列プログラミングの基礎

Microsoft PowerPoint 知る集い(京都)最終.ppt

untitled

NEC All rights reserved 1

GPU GPU CPU CPU CPU GPU GPU N N CPU ( ) 1 GPU CPU GPU 2D 3D CPU GPU GPU GPGPU GPGPU 2 nvidia GPU CUDA 3 GPU 3.1 GPU Core 1

23_33.indd

040312研究会HPC2500.ppt

Ver. 3.8 Ver NOTE E v3 2.4GHz, 20M cache, 8.00GT/s QPI,, HT, 8C/16T 85W E v3 1.6GHz, 15M cache, 6.40GT/s QPI,,

Ver. 3.8 Ver NOTE E v3 2.4GHz, 20M cache, 8.00GT/s QPI,, HT, 8C/16T 85W E v3 1.6GHz, 15M cache, 6.40GT/s QPI,,


Ver. 3.8 Ver NOTE E v3 2.4GHz, 20M cache, 8.00GT/s QPI,, HT, 8C/16T 85W E v3 1.6GHz, 15M cache, 6.40GT/s QPI,

ProLiant BL35p システム構成図

Microsoft Word J.^...O.|Word.i10...j.doc

HP High Performance Computing(HPC)

untitled

Ver. 3.8 Ver NOTE E v3 2.4GHz, 20M cache, 8.00GT/s QPI,, HT, 8C/16T 85W E v3 1.6GHz, 15M cache, 6.40GT/s QPI,,

Itanium2ベンチマーク

untitled

untitled

1 / 1 idrac8 CPU 1 Intel Xeon E v5 Intel Pentium Intel Core i3 Intel Celeron Intel C236 Microsoft Windows Server 2008 R2 SP1 Microsoft Windows S

OVERVIEW hp StorageWorks NAS 2000s hp StorageWorks NAS 2000s A 3.5 B 3.5 IDE DVD-ROM C LED LED Ultra320 SCSI ( ) NAS 2000s NAS 2000s NAS

GPU n Graphics Processing Unit CG CAD

コスト効率の高い業界標準サーバーへのERPの導入

11U Dell CPU RAID 1U 1 Intel Xeon E v5 Intel Pentium Intel Core i3 Intel Celeron Intel C236 Microsoft Windows Server 2008 R2/2008 R2 SP1 Standar

untitled

HP Workstation 総合カタログ

PowerEdge R730xd Contents RAID /RAID & P3-6 PCIe P P P P OS P P P P7 P8 P9 P10-11 P12-17 P P112


Ver Ver NOTE E v3 2.4GHz, 20M cache, 8.00GT/s QPI,, HT, 8C/16T 85W E v3 1.6GHz, 15M cache, 6.40GT/s QPI

ÊÂÎó·×»»¤È¤Ï/OpenMP¤Î½éÊâ¡Ê£±¡Ë

Microsoft PowerPoint - ★13_日立_清水.ppt

Ver. 3.9 Ver E v3 2.4GHz, 20M cache, 8.00GT/s QPI,, HT, 8C/16T 85W E v3 1.6GHz, 15M cache, 6.40GT/s QPI,, HT,

CPU Levels in the memory hierarchy Level 1 Level 2... Increasing distance from the CPU in access time Level n Size of the memory at each level 1: 2.2

09中西

PowerPoint Presentation

Microsoft Word - HOKUSAI_system_overview_ja.docx

( 4 ) GeoFEM ( 5 ) MDTEST ( 6 ) IOR 2 Oakleaf-FX 3 Oakleaf-FX 4 Oakleaf-FX Oakleaf-FX Oakleaf-FX 1 Oakleaf-FX 1 Oakleaf- FX SR11000/J2 HA8000 T

HP xw9400 Workstation

ProLiant DL380 Generation 4 システム構成図

HP Workstation 総合カタログ

スライド 1

ProLiant BL460c システム構成図

Ver. 1.1 Ver NOTE 1TB 7.2K RPM SAS 3.5, 40,100 2TB 7.2K RPM SAS 3.5, 46,600 4TB 7.2K RPM SAS 6Gbps 3.5, 63,600 PowerEdge D

HPEハイパフォーマンスコンピューティング ソリューション

VXPRO R1400® ご提案資料

Ver. 3.7 Ver E v3 2.4GHz, 20M cache, 8.00GT/s QPI,, HT, 8C/16T 85W E v3 1.6GHz, 15M cache, 6.40GT/s QPI,, HT,

Express5800/120Rb-1 (2002/01/22)

main.dvi

P33W・P28X カタログ

T330_ indd

FY14Q4 SMB Magalog December - APJ Version

Ver Ver NOTE E v3 2.4GHz, 20M cache, 8.00GT/s QPI,, HT, 8C/16T 85W E v3 1.6GHz, 15M cache, 6.40GT/s QPI

GRAPE GRAPE-DR V-GRAPE

Myrinet2000 ご紹介

マルチコアPCクラスタ環境におけるBDD法のハイブリッド並列実装

Express5800/140Hb (2002/01/22)

12 PowerEdge PowerEdge Xeon E PowerEdge 11 PowerEdge DIMM Xeon E PowerEdge DIMM DIMM 756GB 12 PowerEdge Xeon E5-

Express5800/120Ed

1 GPU GPGPU GPU CPU 2 GPU 2007 NVIDIA GPGPU CUDA[3] GPGPU CUDA GPGPU CUDA GPGPU GPU GPU GPU Graphics Processing Unit LSI LSI CPU ( ) DRAM GPU LSI GPU

Second-semi.PDF

(^^

インテル(R) Visual Fortran Composer XE

OpenMP (1) 1, 12 1 UNIX (FUJITSU GP7000F model 900), 13 1 (COMPAQ GS320) FUJITSU VPP5000/64 1 (a) (b) 1: ( 1(a))

untitled

テストコスト抑制のための技術課題-DFTとATEの観点から

OVERVIEW ProLiant ML110 G2 Storage Server ProLiant ML110 G2 Storage Server A C D SATA NH 320GB 01 (1TB) (1TB) Ultra320 SCSI 6 SATA RAID Serial

T430_ indd

ProLiant BL20p Generation 4 システム構成図

フカシギおねえさん問題の高速計算アルゴリズム

openmp1_Yaguchi_version_170530

AMD/ATI Radeon HD 5870 GPU DEGIMA LINPACK HD 5870 GPU DEGIMA LINPACK GFlops/Watt GFlops/Watt Abstract GPU Computing has lately attracted

大規模共有メモリーシステムでのGAMESSの利点

資料3 今後のHPC技術に関する研究開発の方向性について(日立製作所提供資料)

HP Z800 Workstation 製品構成ガイド

Microsoft PowerPoint - CCS学際共同boku-08b.ppt

はじめに

HP ProLiant ML110 Generation 5 システム構成図

new_emc_panf_Hyoushi_0818

PROLIANT ML

Express5800/140Ma

Microsoft Word - PowerEdge_M-Series_Competitive_Power_Study_-_August_2010[1]_j.docx

ProLiant BL25p Generation 2システム構成図

2011年2月 Express5800シリーズ Gモデル

PowerEdge R230 Contents RAID /RAID & PCIe OS P3-5 P6 P7 P8 P9 P10-11 P12-28 P29-31 P32 P32 P33-36 P37 P38-42 P42-43 P44-45 V4.11 Apr. 2018

R630_160428_2.indd

B 2 Thin Q=3 0 0 P= N ( )P Q = 2 3 ( )6 N N TSUB- Hub PCI-Express (PCIe) Gen 2 x8 AME1 5) 3 GPU Socket 0 High-performance Linpack 1

Ver. 3.8 Ver E v3 2.4GHz, 20M cache, 8.00GT/s QPI,, HT, 8C/16T 85W E v3 1.6GHz, 15M cache, 6.40GT/s QPI,, HT,

Po w eredge M000e Index? & 00% 5 32CPU 256 0U PowerEdge M000e PowerEdge M000eI/O 6

ProLiant DL180 システム構成図

Microsoft SQL Server 2012 における EMC パフォーマンスの高速化EMC VFCache、EMC Symmetrix VMAX 10K、および EMC FAST VP

Express5800/120Ra-1

Transcription:

6 19 4 27

1. 2. 3. 3.1 3.2 A 3.3 B 4. 5. 2007/4/27 4 1

1. 2007/4/27 4 2

NEC NHF2 18 9 19 19 2 28 10PFLOPS2.5PB 30MW 3,200 18 12 12 SimFold, GAMESS, Modylas, RSDFT, NICAM, LatticeQCD, LANS HPL, NPB-FT 19 2 28 2007/4/27 4 3

NH 1,280 N 40,960 SMP CPU 40,960 163,840 10.48PFLOPS : 2.5PB N 2TB Fat-tree Fat-tree 16GB/s32 32 20Gbps 17.5MW (Linpack) SW2 #00 SW2 #15 SW2 #16 SW2 #31 SW2 #32 SW2 #47 SW2 #48 SW2 #63 Fat-tree 4 SW1 SW1 SW1 #00 #15 #16 16 16 SW0 SW0 SW0 #00 #15 #16 16 SW1 SW1 #31 #32 SW0 SW0 #31 #32 SW1 SW1 #47 #48 SW0 SW0 #47 #48 SW1 SW1 #63 #64 SW0 SW0 #63 #64 SW1 #79 SW0 #79 16GB/s x 16links x 2 16GB/s x 16links x 2 N : 32CPU, 128Core, 8.19TFLOPS, 2TB N : 32CPU, 128Core, 8.19TFLOPS, 2TB N NUMA 16GB/s x 16links x 2 N NUMA 16GB/s x 16links x 2 CPU: 256GFLOPS CPU: 256GFLOPS CPU: 256GFLOPS CPU: 256GFLOPS Core: 2GHz Core: (2FMA 2GHz Core: x 8VPP) (2FMA 2GHz Core: 64GFLOPS x 8VPP) (2FMA 2GHz 64GFLOPS x 8VPP) (2FMA 64GFLOPS x 8VPP) 64GFLOPS (32) Core: 2GHz Core: (2FMA 2GHz Core: x 8VPP) (2FMA 2GHz Core: 64GFLOPS x 8VPP) (2FMA 2GHz 64GFLOPS x 8VPP) (2FMA 64GFLOPS x 8VPP) 64GFLOPS (1280 N ) Core: 2GHz Core: (2FMA 2GHz Core: x 8VPP) (2FMA 2GHz Core: 64GFLOPS x 8VPP) (2FMA 2GHz 64GFLOPS x 8VPP) (2FMA 64GFLOPS x 8VPP) 64GFLOPS (32) Core: 2GHz Core: (2FMA 2GHz Core: x 8VPP) (2FMA 2GHz Core: 64GFLOPS x 8VPP) (2FMA 2GHz 64GFLOPS x 8VPP) (2FMA 64GFLOPS x 8VPP) 64GFLOPS L2$: 8MB L2$: 8MB L2$: 8MB L2$: 8MB 128GB/s 128GB/s 128GB/s 128GB/s MEM: 64GB MEM: 64GB MEM: 64GB MEM: 64GB 2007/4/27 4 4

NH 45nmCPU 256GFLOPS CPU42GHz 2FMAx8128KB 8MB 4RDB Reusable Data Buffering L2 1CPU4 SMP 40,960CPU10.48PFLOPS2.5PB N 32CPU OSMPI CPU CPU 140W Linpack 328TB/s 3Fat tree 1280 N 2007/4/27 4 5

NH OS: LinuxIO OS : : OpenMP MPI : Fortran HPF CAF C/C++ MPI 2007/4/27 4 6

F 82,944 CPU 82,944 663,552 10.61PFLOPS 2.53PB32GB ToFu: +3D 18CPU 1 4608 3D 5.0GB/s 2 1 30GB/s 6 15.5MW (Linpack) 3D 30GB/s x 6 2 /9 / 30GB/s x 6 2 /9 / CPU: 2GHz, 128GFLOPS (8Cores) Core: Core: SIMD(4FMA) Core: SIMD(4FMA) Core: SIMD(4FMA) Core: SIMD(4FMA) Core: SIMD(4FMA) Core: SIMD(4FMA) Core SIMD(4FMA) SIMD(4FMA) 16GFLOPS L2$: 6MB MEM: 32GB 64GB/s 82,944 CPU: 2GHz, 128GFLOPS (8Cores) Core: Core: SIMD(4FMA) Core: SIMD(4FMA) Core: SIMD(4FMA) Core: SIMD(4FMA) Core: SIMD(4FMA) Core SIMD(4FMA) Core: SIMD(4FMA) SIMD(4FMA) 16GFLOPS L2$: 6MB MEM: 32GB 64GB/s 2.5GB/s x 8 links x 2 180GB/s 2.5GB/s x 8 links x 2 180GB/s 2007/4/27 4 7

F 45nm 1CPU LSI 128GFLOPS 1CPU82GHz FP128SPARC-V9 4 SIMD4FMA 4 HPC 6MB L28 / 82,944CPU 10.6PFLOPS2.53PB Linpack 58W/CPU 20 ToFu Torus-connected Full connection 18CPU 1 3D 2 2007/4/27 4 8

F OS POSIX UNIX OS OpenMP MPI : 8SMP8SMP D ToFu Fortran XP Fortran HPF CAF C/C++ MPI 2007/4/27 4 9

NH F PFLOPS 10.48 10.61 PB 2.50 2.53 PB 140 140 / m 2 1,446 / 2,976 1,475 / 3,198 / MW 17.5 / 23 Linpack 15.5 / 22.8 Linpack CPU 40,960 82,944 163,840 663,552 Fat Tree D 2007/4/27 4 10

NH F GHz 2 GFLOPS 64 16 16: 2FMA x 8VPP) SIMD 4FMA 256 64 128 GFLOPS 256 128 4 8 CPU Byte/Flop L2 0.5 MB 8 6 Byte/Flop 4 2 2007/4/27 4 11

2GHz Thin Fat NH 40,960 F 82,944 () HPC 2007/4/27 4 12

2 NH 4 16 F 8 66 NH F SIMD NH Fat Tree F D 2007/4/27 4 13

21 9 7 SimFold GAMESS Modylas RSDFT NICAM LatticeQCD LANS HPL High Performance LinpackNPB-FT 2007/4/27 4 14

9 12 10 NH F PFLOPS 8 6 4 2 0 SimFold GAMESS Modylas RSDFT NICAM LatticeQCD LANS HPL NPB-FT 7 HPL NPB- FT 2007/4/27 4 15

9 2.5 2.0 NH LatticeQCD LANS 1.5 1.0 0.5 0.0 NH F NH F NH F NH F NH F NH F NH F NH F SimFold GAMESS Modylas RSDFT NICAM LatticeQCD LANS NPB-FT RSDFTNPB-FT 2007/4/27 4 16

12 NH F PFLOPS BMT 2007/4/27 4 17

10PFLOPS2.5PB 30MW 3,200 BMT CPU F NH F NH 2007/4/27 4 18

2 22 2 2 2007/4/27 4 19

2. 2007/4/27 4 20

1. LINPACK 10PFLOPS 2. 10PFLOPS 10PFLOPS 3-5PFLOPS PC 3. 3PFLOPS 3PFLOPS 1PFLOPS 2007/4/27 4 21

F 10PFLOPSNH 3PFLOPS 121 3 A B Fat Tree Fat Tree ToFu Fat Tree NIC F NH F NH F NH ToFu 10 3 10 3 10 3 2007/4/27 4 22

FNH 10PFLOPS 3PFLOPS Linpack 10PFLOPS A B ToFu Fat Tree F NH 2007/4/27 4 23

1/3 + 2007/4/27 4 24

2/3 SIMD 2007/4/27 4 25

3/3 CPU 2007/4/27 4 26

3. 2007/4/27 4 27

A B ToFu Fat Tree A B LINPACK 10PFLOPS A 10PFLOPS B 3PFLOPS A: 11.2PFLOPS x 85% LINPACK =9.52PFLOPS B: 3.1PFLOPS x 90% LINPACK =2.79PFLOPS 1.2TB/ 15PB F 80PB NH 5PB A+B LINPACK 90% 11.08PFLOPS 85% 10.46PFLOPS 80% 9.85PFLOPS A 1/8 B/FLOPS B 1/4 1/8 B/FLOPS 100PB A B10 A B B 2007/4/27 4 28

On-the-fly 2007/4/27 4 29

On-the-fly 10PFLOPS t 1 t 2 t 3 2, 2, 2, t 1 t 2 t 3 A B 2007/4/27 4 30

On-the-fly 10A A 2 10TB B 10PFLOPS 3PFLOPS 10TB 10TB 1PFLOPS 2 on N 1 2 10TB 10TB 2 on N 2 2 10TB 10TB 2 on N n 2 21.6 16GB/CPU 1.0 1TB/ A B 2007/4/27 4 31

- e - e I - I - 3 3PF 1PF 30GB 40TB 45GB 0.3GB SCF-CI 4GB 3GB 2007/4/27 4 32

A 10PFLOPS 15PB A ToFu F 80PB B Fat Tree NH 5PB B 1PFLOPS 1TB NUMA 2007/4/27 4 33

A 13PF A 10PF+B 3PF (1PF )10 A 13PF 100 30 130 1.7 1.16 A 10PF 100 51 B 3PF 151 5,000 800 4,500 B : NICAM 1 : 1.9 LANS 1 : 1.5 2007/4/27 4 34

3.1 2007/4/27 4 35

CPU 99,840 749,568 14.3PFLOPS 1.7-2.1PB 100PB A B 24MW 3,800 1.68MW/PFLOPS 266 /PFLOPS 15PB CPU 87,552 700,416 11.2PFLOPS 1.34PB 15.2MW 1,900 CPU 12,288 49,152 3.14PFLOPS 0.375-0.75PB 6.8MW 900 5PB 80PB 1.2TB/s 2.0MW 700 2007/4/27 4 36

MPI A MPI B 2007/4/27 4 37

A B ACL MPI API 2007/4/27 4 38

A B 57m 52m A B 1,900 900 700 54.5m 3,800 36m 70m 12.5m 17m 2007/4/27 4 39

2007/4/27 4 40 2007 2008 2009 2010 2011 A LSI OS LSI OS B 2 2 2 2 1 1 2 2 1 1 1 1

3.2 A 2007/4/27 4 41

A 87,552 CPU 87,552 700,416 11.2PFLOPS 1.34PB16GB ToFu +3D 18CPU 1 20x16x16 =5,120 3D 15.2 MW (Linpack) 3D 30GB/s x 6 2 /9 / 30GB/s x 6 2 /9 / CPU: 2GHz, 128GFLOPS (8Cores) Core: Core: SIMD(4FMA) Core: SIMD(4FMA) Core: SIMD(4FMA) Core: SIMD(4FMA) Core: SIMD(4FMA) Core: SIMD(4FMA) Core SIMD(4FMA) SIMD(4FMA) 16GFLOPS L2$: 6MB MEM: 16GB 64GB/s 87,552 CPU: 2GHz, 128GFLOPS (8Cores) Core: Core: SIMD(4FMA) Core: SIMD(4FMA) Core: SIMD(4FMA) Core: SIMD(4FMA) Core: SIMD(4FMA) Core SIMD(4FMA) Core: SIMD(4FMA) SIMD(4FMA) 16GFLOPS L2$: 6MB MEM: 16GB 64GB/s 2.5GB/s x 8 links x 2 180GB/s 2.5GB/s x 8 links x 2 180GB/s 2007/4/27 4 42

A 45nm 1CPU(LSI)128GFLOPS 1CPU82GHz FP128SPARC-V9 4 ) SIMD (4FMA 4 ) HPC 6MB L28 / 42W/CPU Linpack 58W/CPU20 ToFu (Torus-Full connection) 18CPU 1 3 2 2007/4/27 4 43

8128FP 2GHz SIMD 4 4 16GFLOPS CPU 128GFLOPS 6MB 64GB/s 32GB/s 32GB/s L2 L1 2B/FLOP L2 0.5B/FLOP CPU 128GF 16GFx8 2GH 8 2 2SIMD 2 2SIMD 2 2 1 1 8KB(2way) 116KB(2way) 2 6MB(12way) 64GB/s 2007/4/27 4 44

SIMD 4,8 (1) SIMD 2 (2) SIMD 4 Basic FPR(%b0-%b63) FPR(%e0-%e63) FPR(%b0-%b63) FPR(%e0-%e63) FMA FMA FMA FMA A-pipe B-pipe C-pipe D-pipe FMA FMA FMA FMA A-pipe B-pipe C-pipe D-pipe Extend (3) SIMD1 SIMD2 FPR(%b0-%b63) FPR(%e0-%e63) FPR(%b0-%b63) FPR(%e0-%e63) FMA FMA FMA FMA A-pipe B-pipe C-pipe D-pipe FMA FMA FMA FMA A-pipe B-pipe C-pipe D-pipe 2007/4/27 4 45

CPU 16GB SBCPU 2 32GB ICC Interconnect Controller CPU-ICC 32GB/s ICCPCI Express gen2 DIMM DIMM 32GB/s 32GB/s DIMM CPU CPU 32GB/s 32GB/s 82GB/s ICC DIMM 32GB/s 32GB/s PCIe Gen2 4GB/s x3 ToFu 6.4Gbps / differential pair PCIe Gen2... 5Gbps / differential pair Full / ToFu 5GB/s x8 Torus / ToFu 10GB/s x 2(+1) 2007/4/27 4 46

ToFu ToFu Torus-connected Full-connection 2 9SB 2.5GB/s 2) ToFu 2 20x16x16 3 5GB/s x 3 x 2 = 30GB/s 0.1 1.6 0.8 MPI 1.1 2.6 1.8 3D 2007/4/27 4 47

8 1600 750 2000 mm 3 52m 36m 2007/4/27 4 48

25 SW (8) (50) 10GbE SW 10GbE 50TB RAID10 18 8 SCFB 50TB RAID10 18 8 SW SW IO SB SW SW (320) (320) (320) (320) 10GbE SW 10GbE SW 10GbE SW 10GbE SW 1GbE SW 1GbE SW 1(12) 8GFC 56 56 77PB 2007/4/27 4 49

OS POSIXUNIX OS SW OpenMP MPI : Fortran HPF CAF XP Fortran C/C++ A 8SMP 87,552 B ToFu 2007/4/27 4 50

RAS CPU ECC RAM 3 2007/4/27 4 51

3.3 B 2007/4/27 4 52

B 12,288 384 N CPU 12,288 49,152 3.14PFLOPS 0.375-0.75PB32-64GB N 32CPUs NUMA1TB-2TB 2 Fat-tree (24 + 16) x 16 7MW 900 Fat-tree SW2 24 #00 SW0 #00 16 SW0 #02 SW2 #02 SW0 #03 SW2 #15 SW0 #23 16GB/s x 16links x 2 16GB/s x 16links x 2 N : 32CPU, 128Core, 8.19TFLOPS, 1-2TB N : 32CPU, 128Core, 8.19TFLOPS, 1-2TB N NUMA 16GB/s x 16links x 2 N NUMA 16GB/s x 16links x 2 CPU: 256GFLOPS CPU: 256GFLOPS CPU: 256GFLOPS CPU: 256GFLOPS Core: 2GHz Core: (2FMA 2GHz Core: x 8VPP) (2FMA 2GHz Core: 64GFLOPS x 8VPP) (2FMA 2GHz 64GFLOPS x 8VPP) (2FMA 64GFLOPS x 8VPP) 64GFLOPS (32) Core: 2GHz Core: (2FMA 2GHz Core: x 8VPP) (2FMA 2GHz Core: 64GFLOPS x 8VPP) (2FMA 2GHz 64GFLOPS x 8VPP) (2FMA 64GFLOPS x 8VPP) 64GFLOPS (384 N ) Core: 2GHz Core: (2FMA 2GHz Core: x 8VPP) (2FMA 2GHz Core: 64GFLOPS x 8VPP) (2FMA 2GHz 64GFLOPS x 8VPP) (2FMA 64GFLOPS x 8VPP) 64GFLOPS (32) Core: 2GHz Core: (2FMA 2GHz Core: x 8VPP) (2FMA 2GHz Core: 64GFLOPS x 8VPP) (2FMA 2GHz 64GFLOPS x 8VPP) (2FMA 64GFLOPS x 8VPP) 64GFLOPS L2$: 8MB L2$: 8MB L2$: 8MB L2$: 8MB 256GB/s 256GB/s 256GB/s 256GB/s MEM: 32-64GB MEM: 32-64GB MEM: 32-64GB MEM: 32-64GB 2007/4/27 4 53

B 45nmCPU 256GFLOPS CPU42GHz 8FMAx2128KB 8MB L24 RDB (Reusable Data Buffering) 12,288CPU3.14PFLOPS0.375-0.75PB N 32CPU OS : 140W/CPU Linpack 98TB/s 2Fat tree 384 N 2007/4/27 4 54

4 1 8MB L2 2GHz 64GFLOPS CPU 256GFLOPS 1B/FLOP 8MB L2 256GB/s 128GB/s 1B/FLOP L2 4B/FLOP RDB (Reusable Data Buffering) 256GF 64GFx4 8MB 8way- 64B/4 Unified 1B/FLOP 16GB/s 2 256GB/s 2007/4/27 4 55

128 4way 8 2 / 1 2007/4/27 4 56

N 4CPU 1U 8U 32CPU I/O NUMA 2CPUN33x33 16GB/s x 2 I/Ox86 NN N N 16GB/s x x 16 MM MM MM MM C C C C C C C C C C C C C C C C MM MM MM MM MM MM MM MM C C C C C C C C C C C C C C C C MM MM MM MM MM MM MM MM C C C C C C C C C C C C C C C C MM MM MM MM I/O I/O U #0 U #1 U #7 CPU CPU 2007/4/27 4 57

N 2Fat-tree 16GB/s32 32 20Gbps N 16 16 384 N 98TB/s SW2 24 #00 SW0 SW0 #00 #02 16 N 16 CPU #0~3 CPU #4~7 CPU #28~31 SW2 #02 SW2 #15 SW0 #03 SW0 #23 2007/4/27 4 58

54.5m 2 N ) 1I/O21 2000mm 2000mm 1000mm 2N 8 2000mm I/O SW 8SW 1000mm 600mm 800mm I/O 900 17m 800mm 1000mm 2007/4/27 4 59

OS: LinuxIO OS : SW : OpenMP MPI : Fortran HPF CAF C/C++ UPC 2007/4/27 4 60

RAS CPU ECCRAM(L2 ) I/F RAM MOD-N Out-of-N BIST (Built-In Test) / LSI ECC 1 N / OS CPU N I/O NN RAID6 I/O 2007/4/27 4 61

4. 2007/4/27 4 62

A SIMD ToFu SIMD RAS B Fat-tree VCSEL 20Gbps SerDes RAS 2007/4/27 4 63

A LSI (1/2) LSI 45nm LSI 8 HPC SIMD 6MB 128GFLOPS /101 / ) - RAM - Vth - - Vdd, Vbs 2007/4/27 4 64

A LSI (2/2) ( 10 ) LSI R A M L1$ L1 $ SEC DED ECC L2$ SEC DED ECC SEC DED ECC mtlb 2007 2008 2009 2010 GPR FPR GUB FUB PC PSTATE ALU SHIFT FMA 2007/4/27 4 65

2007/4/27 4 66 A (1/2) I/O 6.25Gbps 6.25Gbps PT 15 IDC 3.125Gbps SystemBoard ICC SystemBoard ICC SystemBoard ICC SystemBoard ICC SystemBoard ICC SystemBoard ICC SystemBoard ICC SystemBoard ICC SystemBoard ICC SystmBoard ICC CN SystmBoard ICC CN SystmBoard ICC CN SystmBoard ICC CN SystmBoard ICC CN SystmBoard ICC CN SystmBoard ICC CN SystmBoard ICC CN SystmBoard ICC CN

A (2/2) ToFu MPI 100PetaFlops HPC 2007 2008 2009 2010 2007/4/27 4 67

A (1/2) (SB) SB (SB) 2007/4/27 4 68

A (2/2) CPU0.006( ) / 2007 2008 2009 2010 / / 2007/4/27 4 69

A SIMD (1/2) Basic, Extend 2 2/ Basic, Extend SIMD 2 SIMD DO I=1,N IF ((I)) then A(I)=B(I)+C(I) ELSE X(I)=Y(I)*Z(I) ENDIF ENDDO L2, L1 DO I=1,N,2 IF ((I)) then IF ((I+1)) then A(I)=B(I)+C(I) A(I+1)=B(I+1)+C(I+1) ELSE A(I)=B(I)+C(I) X(I+1)=Y(I+1)*Z(I+1) ENDIF ELSE IF ((I+1)) then X(I)=Y(I)*Z(I) A(I+1)=B(I+1)+C(I+1) ELSE X(I)=Y(I)*Z(I) X(I+1)=Y(I +1)*Z(I+1) ENDIF ENDIF ENDDO 2007/4/27 4 70

A SIMD (2/2) Venus 8 ( ) SIMD 2007 2008 2009 2010 / / 2007/4/27 4 71

B LSI(1/2) (1) NMOS PMOS N+ N+ P_well P+ P+ N_well P_sub / 90nm 65nm 45nm (2) 45nmCMOS Low etc /SRAM etc Vth etc etc 2007/4/27 4 72

B LSI(2/2) LSI TEG LSI LSI 2007/1 2009/1 2009/4 2006 2007 2008 2009 2010 fix RTL LSI TEG TO LSI LSI LSI 2007/4/27 4 73

B(1/2) (1) (2) 20Gbps SerDes ITRS 2 5 10Gbps 1000/LSI 1/200 1/100 2007/4/27 4 74 100G bps 10G 1G ITRS 2 20Gbps 2000 2005 2010

B(2/2) FIX 2007/4 2007/4 2007/2 2008/2 2009/2 2009/3 2006 2007 2008 2009 2010 fix RTL LSI 2007/4/27 4 75

B(1/2) Program DO DO i = 1, 1, n +B(i-1)+ +B(i)+ = +B(i+1) END END DO DO i-1 i i+1 VL i-1 i i+1 VL 2007/4/27 4 76

B(2/2) 2007/4Q 2008/4Q 2009/4Q 2010/4Q 2006 2007 2008 2009 2010 fix RTL LSI 2007/4/27 4 77

5. 2007/4/27 4 78

21 9 7 SimFold GAMESS Modylas RSDFT NICAM LatticeQCD LANS HPL High Performance LinpackNPB-FT 2007/4/27 4 79

2007/4/27 4 80

2007/4/27 4 81

2007/4/27 4 82

2007/4/27 4 83

F 1/2 (SPARC64VI 1Core) SIMD SIMD or 2007/4/27 4 84

F 2/2 2007/4/27 4 85