スライド タイトルなし

Similar documents
23 Fig. 2: hwmodulev2 3. Reconfigurable HPC 3.1 hw/sw hw/sw hw/sw FPGA PC FPGA PC FPGA HPC FPGA FPGA hw/sw hw/sw hw- Module FPGA hwmodule hw/sw FPGA h

スライド 1

テストコスト抑制のための技術課題-DFTとATEの観点から

GPGPU

単位、情報量、デジタルデータ、CPUと高速化 ~ICT用語集~

スライド 1

Bull. of Nippon Sport Sci. Univ. 47 (1) Devising musical expression in teaching methods for elementary music An attempt at shared teaching

鹿大広報149号

untitled

09中西

<31322D899C8CA982D982A95F985F95B65F2E696E6464>

RW1097-0A-001_V0.1_170106

2

16.16%

設計現場からの課題抽出と提言 なぜ開発は遅れるか?その解決策は?

生研ニュースNo.132

スパコンに通じる並列プログラミングの基礎


西川町広報誌NETWORKにしかわ2011年1月号

On the Wireless Beam of Short Electric Waves. (VII) (A New Electric Wave Projector.) By S. UDA, Member (Tohoku Imperial University.) Abstract. A new e


Microsoft Word - PCM TL-Ed.4.4(特定電気用品適合性検査申込のご案内)

untitled

2017 (413812)

スパコンに通じる並列プログラミングの基礎

NINJAL Research Papers No.8

25 II :30 16:00 (1),. Do not open this problem booklet until the start of the examination is announced. (2) 3.. Answer the following 3 proble

21 Quantum calculator simulator based on reversible operation

スライド 1


評論・社会科学 98号(P)☆/1.鰺坂

千葉県における温泉地の地域的展開

L1 What Can You Blood Type Tell Us? Part 1 Can you guess/ my blood type? Well,/ you re very serious person/ so/ I think/ your blood type is A. Wow!/ G

MOSFET HiSIM HiSIM2 1

06’ÓŠ¹/ŒØŒì

EQUIVALENT TRANSFORMATION TECHNIQUE FOR ISLANDING DETECTION METHODS OF SYNCHRONOUS GENERATOR -REACTIVE POWER PERTURBATION METHODS USING AVR OR SVC- Ju

A Responsive Processor for Parallel/Distributed Real-time Processing

OPA134/2134/4134('98.03)

Core1 FabScalar VerilogHDL Cache Cache FabScalar 1 CoreConnect[2] Wishbone[3] AMBA[4] AMBA 1 AMBA ARM L2 AMBA2.0 AMBA2.0 FabScalar AHB APB AHB AMBA2.0

スパコンに通じる並列プログラミングの基礎

untitled

Fig, 1. Waveform of the short-circuit current peculiar to a metal. Fig. 2. Waveform of arc short-circuit current. 398 T. IEE Japan, Vol. 113-B, No. 4,

Original (English version) Copyright 2001 Semiconductor Industry Association All rights reserved ITRS 2706 Montopolis Drive Austin, Texas

Visual Evaluation of Polka-dot Patterns Yoojin LEE and Nobuko NARUSE * Granduate School of Bunka Women's University, and * Faculty of Fashion Science,

2 146

6 7 22

卒業論文

Mikio Yamamoto: Dynamical Measurement of the E-effect in Iron-Cobalt Alloys. The AE-effect (change in Young's modulus of elasticity with magnetization


Estimation of Photovoltaic Module Temperature Rise Motonobu Yukawa, Member, Masahisa Asaoka, Non-member (Mitsubishi Electric Corp.) Keigi Takahara, Me

1 All Rights Reserved, Copyright 2004, NEC Corporation 2 All Rights Reserved, Copyright 2004, NEC Corporation

,,.,,.,..,.,,,.,, Aldous,.,,.,,.,,, NPO,,.,,,,,,.,,,,.,,,,..,,,,.,

untitled

国際恋愛で避けるべき7つの失敗と解決策

CPU Levels in the memory hierarchy Level 1 Level 2... Increasing distance from the CPU in access time Level n Size of the memory at each level 1: 2.2


43 + +* / +3+0,, 22*,, ++..0/ / 1/. / / + /* *,* +* *.* /* *,/./ +3+,. + : / 3 / +** +**, // /. /+ /+ + * * ,* , 0.. /3 : +/,.

次世代スーパーコンピュータのシステム構成案について

Design at a higher level

はじめに

LAGUNA LAGUNA 10 p Water quality of Lake Kamo, Sado Island, northeast Japan, Katsuaki Kanzo 1, Ni

日立評論2008年1月号 : 基盤技術製品

JOURNAL OF THE JAPANESE ASSOCIATION FOR PETROLEUM TECHNOLOGY VOL. 66, NO. 6 (Nov., 2001) (Received August 10, 2001; accepted November 9, 2001) Alterna

F9222L_Datasheet.pdf

17 Proposal of an Algorithm of Image Extraction and Research on Improvement of a Man-machine Interface of Food Intake Measuring System

4.1 % 7.5 %

9 1, , , 2002, 1998, 1988,

Huawei G6-L22 QSG-V100R001_02

デジタルメディアの時代における協働社会のデザインと地方行政の役割 : 元住吉商店街プロジェクトでの実践活動を通して

Microsoft PowerPoint - GPU_computing_2013_01.pptx

12 DCT A Data-Driven Implementation of Shape Adaptive DCT

卒業論文2.dvi

09[ ]鶴岡(責).indd

* Meso- -scale Features of the Tokai Heavy Rainfall in September 2000 Shin-ichi SUZUKI Disaster Prevention Research Group, National R

橡自動車~1.PDF

1重谷.PDF

alternating current component and two transient components. Both transient components are direct currents at starting of the motor and are sinusoidal

Table 1. Reluctance equalization design. Fig. 2. Voltage vector of LSynRM. Fig. 4. Analytical model. Table 2. Specifications of analytical models. Fig

untitled

Description

VLSI工学

76_01ver3.p65


untitled

FINAL PROGRAM 22th Annual Workshop SWoPP / / 2009 Sendai Summer United Workshops on Parallel, Distributed, and Cooperative Processing

きずなプロジェクト-表紙.indd


39B: Dae-Yeong JANG Laboratory of Regional Society, Minami Kyushu University, Takanabe, Miyazaki , Japan Accepted : Janu

Development of Induction and Exhaust Systems for Third-Era Honda Formula One Engines Induction and exhaust systems determine the amount of air intake

自分の天職をつかめ

„h‹¤.05.07

LP3470 Tiny Power On Reset Circuit (jp)

soturon.dvi

エッセー


3

untitled

大学野球の期分けにおける一般的準備期のランニング トレーニングが試合期の大学生投手の実戦状況下 パフォーマンスに与える影響

1 2 3

untitled

Transcription:

/

) FLOPS 1FLOPS=1 / 1GF(Giga Flops)=10 / 1TF(Tera Flops)=1 /

Aggregate Systems Performance Increasing Parallelism Single CPU Performance CPU Frequencies

Cray-1 Seymor Cray

Cray-2

SX-2

SX-4

SX-6

CM5

10TF 1TF SX-6 Multi Node 10 5 km/h 10 4 km/h 100GF SX-6 1000km/h 10GF HPC Server 100km/h 1GF Server PC 10km/h 1GF(Giga Flops)=10 9 Floating Point Operations per Sec.(10 /) 1TF(Tera Flops)=10 12 Floating Point Operations per Sec.(1 /)

10 10ton 100kw 1000 100kg 1kw 10 1kg 10w

/

( ) =

Next animation Change of surface temperature due to increase of CO2 - difference from 1991 level temperatures. - every 5years animation CRIEPI

:VeritasDGC

C4H4S+H2

/

DNA

125 300Km300Km18 1.510 5 1.510 16 100GFLOPS 5 4 50Km50Km50 1.510 7 1.510 18 100GFLOPS 1.4 1TFLOPS 50 10TFLOPS 5 400

: - - - Each CPUs executes their share of computation (North American 24hours Precipitation) NEC SX-6/8A Power x 640 The Earth Simulator > 40TFLOPS 1Q2002

20023 Earth Simulator Facilities Research Building Simulator Building New Linpack Record - 35.8TFLOPS (5 X previous #1 ASCI White = 7.2TF)

Japanese Computer Is World's Fastest, as U.S. Falls Back By JOHN MARKOFF AN FRANCISCO, April 19 A Japanese laboratory has built the world's fastest computer, a machine so powerful that it matches the raw processing power of the 20 fastest American computers combined and far outstrips the previous leader, an IBM-built machine. The achievement, which was reported today by an American scientist who tracks the performance of the world's most powerful computers, is evidence that a technology race that most American engineers thought they were winning handily is far from over. American companies have built the fastest computers for most of the last decade. The accomplishment is also a vivid statement of contrasting scientific and technology priorities in the United States and Japan. The Japanese machine was built to analyze climate change, including global warming, as well as weather and earthquake patterns. By contrast, the United States has predominantly focused its efforts on building powerful computers for simulating weapons, while its efforts have lagged in scientific areas like climate modeling.

(312km,T42L24)

(10.4km,T1279L24)

ULSTI,1 UI ULSTI,2 ULSTI,3 UUI COEFI,1 UI COEFI,2 ULSTI,1 COEFI,3 ULSTI,2 COEFI,4 ULSTI,3

Scalar Processing Vector Processing (Memory to Memory) Vector Processing (Vector Register) Shared Memory Multiprocessors Distributed Memory Parallel Processor Distributed Shared PP Performance Limitation by Scalar Processing Vector Processing Bottleneck in Memory Throughput Vector Register Vectorizing Compiler Performance Limitation by Single Processor Multiprocessor Parallelizing Compiler Bottleneck in Memory Throughput Distributed Memory Difficult to Code Distributed Shared Memory Scalar Processor Vector Processor Vector Pipes Vector Processor Vector Pipes Vector Register Vector Processor Processor Main Memory SMP SMP Main Memory Main Memory Main Memory Main Memory Network Network Mainframe CDC6600/7600 CYBER200 CRAY-1 SX-2 VP-200 S810/S820 CRAY- XMP/YMP CRAY-C90/T90 SX-3/SX-4/SX-5 VP2000 S3800 VPP500 T3E SP-2 CM5 ncube PARAGON SX-5/SX-6 RS6000/SP O2K TX7

µ-processor Memory Cache Registers + * / Arithmetic Pipes

(CPU) (P) P0 P1 P2 P3 A B C D A B C D

Yi Zi Xi Xi = (Yi + Zi) S S

CPU CPU CPU

DO 20 I = I1, I2 IF( I.LT.INXT ) $ GO TO 20 IF( WI( I ).EQ.ZERO ) THEN INXT = I + 1 ELSE IF( A( I+1, I ).EQ.ZERO ) THEN WI( I ) = ZERO WI( I+1 ) = ZERO ELSE IF( A( I+1, I ).NE.ZERO.AND. A( I, I+1 ).EQ. $ ZERO ) THEN WI( I ) = ZERO WI( I+1 ) = ZERO IF( I.GT.1 ) $ CALL DSWAP( I-1, A( 1, I ), 1, A( 1, I+1 ), 1 ) IF( N.GT.I+1 ) $ CALL DSWAP( N-I-1, A( I, I+2 ), LDA, $ A( I+1, I+2 ), LDA ) CALL DSWAP( N, VS( 1, I ), 1, VS( 1, I+1 ), 1 ) A( I, I+1 ) = A( I+1, I ) A( I+1, I ) = ZERO END IF INXT = I + 2 END IF 20 CONTINUE END IF CALL DLASCL( 'G', 0, 0, CSCALE, ANRM, N-IEVAL, 1, $ WI( IEVAL+1 ), MAX( N-IEVAL, 1 ), IERR ) END IF * IF( WANTST.AND. INFO.EQ.0 ) THEN * * Check if reordering successful * LASTSL =.TRUE.

Nine Lessons Learned in the Design of CDC6600 (N.R.Lincoln) It s Really not as much Fun Building a Supercomputer as it is Simply inventing one (High Speed Computer and Algorithm Organization,1977) Lesson 2 Circuit design and system architecture are only pieces in a large puzzle called supercomputer CPU. A major limitation on the feasibility of a given supercomputer project could well be the mechanical,power,packaging and cooling requirements of the overall electronic design.

CPU CPU

CPU

CPU

CPU Data CPU 8,2,5,1000,659 CPU 391,422,10,51 CPU

CPU

>

SX

LSI 216m 2cm 2cm 216m 2cm 2cm LSI 2cm LSI 2cm 1.5mm (0.15m ) 216m ( 0.1mm,5,200) 2m 1m

NEC/ 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00 01 SX-4 SX-5 SX-1/2 (1GFLOPS) SX-2 SX-3 SX-4 SX-5 SX-6 SX-3 (UNIX) (CMOS) (1Chip) PC9801 IBM ( ) 301 ( ) S/C S/C ( ) (MITI ) ( ) HNSX( ) (, ) ESS( ) NCAR Cray (SX) (MIT/LLNL/LANL/NASA/EPA) Daimler/ CSCS DLR Volvo Chrysler IDRIS INGV HARC NLR VW ( ) ( ) NSRC GTRI / (KMA) CSIRO Renault

Tr 8G Memory Chip and Tr in -Processor bits 64G 250 nm 4G 16G Design Rule Bits/Chip 200 2G 4G Tr/Chip 1G 1G 100 500M 256 (ITRS 01)

Clock Frequencies I/O Pads Power Power Dissipation Power (W) I/O Hz nm 250 On/off-chip 300 3000 10G Design Rule High Performance 200 2000 I/O Pads 200 1000 1G Power 100 100 (ITRS 01)

Logic Technology Roadmap ITRS 99 ITRS 01 YEAR 1999 2000 2001 2002 2003 2004 2005 2008 2011 2014 MPU Gate Length (nm) 140 120 100 85 80 70 65 45 32 22 ASIC Gate Length (nm) 180 165 150 130 120 110 100 70 50 35 Nominal I on at 25 C (µa/µm) 750/350 750/350 750/350 750/350 750/350 750/350 750/350 750/350 750/350 750/350 [NMOS/PMOS] high-performance Maximum I off at 25 C (pa/µm) 5 7 8 10 13 16 20 40 80 160 (For minimum L device) low power Equivalent physical oxide thickness 1.9-2.5 1.9-2.5 1.5-1.9 1.5-1.9 1.5-1.9 1.2-1.5 1.0-1.5 0.8-1.2 0.6-0.8 0.5-0.6 Tox (nm) L gate 3σ variation (nm) 14 12 10 8.5 8 7 6.5 5 3.2 2.2 (dense and isolated lines) Gate electrode sheet Rs (Ω/ ) 4-6 4-6 4-6 4-6 4-6 4-6 4-6 4-6 4-6 4-6 Silicide thickness (nm) 55 45 40 34 32 28 25 20 15 12 Contact silicide sheet Rs (Ω/ ) 2.7 3.3 3.8 4.4 4.7 5.4 6.0 7.5 10.0 12.5 Drain extension Xj (nm) 42-70 36-60 30-50 25-43 24-40 20-35 20-33 16-26 11-19 8-13 Number of metal levels 6-7 6-7 7 7-8 8 8 8-9 9 9-10 10 Local wiring pitch (nm) 500 450 405 365 330 295 265 185 130 95 Intermediate wiring pitch (nm) 640 575 520 465 420 375 340 240 165 115 Minimum global wiring pitch (nm) 1050 945 850 765 690 620 560 390 275 190 Conductor effective resistivity 2.2 2.2 2.2 2.2 2.2 2.2 2.2 1.8 <1.8 <1.8 Cu wiring (µω-cm) Barrier/cladding thickness 17 16 14 13 12 11 10 0 0 0 (for Cu wiring) (nm) Interlevel metal insulator 3.5-4.0 3.5-4.0 2.7-3.5 2.7-3.5 2.2-2.7 2.2-2.7 1.6-2.2 1.5 <1.5 <1.5 -effective dielectric constant (k) YEAR OF PRODUCTION 2001 2002 2003 2004 2005 2006 2007 2010 2013 2016 DRAM 1/2 PITCH(nm) 130 115 100 90 80 70 65 45 32 22 MPU/ASIC1/2PITCH(nm) 150 130 107 90 80 70 65 50 35 25 MPU PRINTED GATE LENGTH(nm) 90 75 65 53 45 40 35 25 18 13 MPU PHYSICAL GATE LENGTH(nm) 65 53 45 37 32 28 25 18 13 9 Physical gate length high-performance(hp)(nm)[1] 65 53 45 37 32 28 25 18 13 9 Equivalent physical oxide thickness for high-performance T ax(eot)(nm)[2] 1.3-1.6 1.2-1.5 1.1-1.6 0.9-1.4 0.8-1.3 0.7-1.2 0.6-1.1 0.5-0.8 0.4-0.6 0.4-0.5 Gate depletion and quantum effects electrical thickness adjustment facctor(nm)[3] 0.8 0.8 0.8 0.8 0.8 0.8 0.5 0.5 0.5 0.5 T ax electrical equivalent(nm)[4] 2.3 2.1 2 2 1.9 1.9 1.4 1.2 1 0.9 Nominal power supply voltage(v dd )(V)[5] 1.2 1.1 1 1 0.9 0.9 0.7 0.6 0.5 0.4 Nominal high-performance NMOS sub-threshold leakage current,1 sd,leuk(at 25 )( µα - µ m)[6] 0.01 0.03 0.07 0.1 0.3 0.7 1 3 7 10 Nominal high-performance NMOS saturation drive current,idd(at V dd, at 25 )( µ A- µ m)[7] 900 900 900 900 900 900 900 1200 1500 1500 Required percent current-drive"mobility /transconductance improvement"[8] 0% 0% 0% 0% 0% 0% 0% 30% 70% 100% Parasitic source/drain resistance(rsd)(ohm- µ m)[9] 190 180 180 180 180 170 140 110 90 80 Parasitic source/drain resistance(rsd)percent of ideal channel resistance(v dd /I dd )[10] 16% 16% 17% 18% 19% 19% 20% 25% 30% 35% Parasitic capacitance percent of ideal gate capacitance[11] 19% 22% 24% 27% 29% 32% 27% 31% 36% 42% High-performance NMOS device t(c gate*v dd /I dd -NMOS)(ps)[12] 1.6 1.3 1.1 0.99 0.83 0.76 0.68 0.39 0.22 0.15 Relative device performance[13] 1 1.2 1.5 1.6 2 2.1 2.5 4.3 7.2 10.7 Energy per(w/l gate=3)device switching transition (C gate*(3*l gate)*v 2 )(fj/device)[14] 0.347 0.212 0.137 0.099 0.065 0.052 0.032 0.015 0.007 0.002 Static power dissipation per(w/lgate=3)device 0.5E 6.7E 1.0E 1.1E 2.6E 5.3E 5.3E 9.7E 1.4E 1.1E (Watts/Device)[15] -09-09 -08-08 -08-08 -08-08 -07-07

How to Utilize Chip Area?( 2010) Chip Size:6.2cm 2 (0.07m Rule) P -P Core:0.1cm 2 (5MTr)

10 10

10000 1000 100 10 1 1G 100G 10T 1P

- - Collaboration Tools Data Mgmt Tools... Distributed simulation

E N D