FabHetero FabHetero FabHetero FabCache FabCache SPEC2000INT IPC FabCache 0.076%

Similar documents
Core1 FabScalar VerilogHDL Cache Cache FabScalar 1 CoreConnect[2] Wishbone[3] AMBA[4] AMBA 1 AMBA ARM L2 AMBA2.0 AMBA2.0 FabScalar AHB APB AHB AMBA2.0

1 FabScalar FabCache FabBus FabHetero FabCache FabCache FabCache FabCache FabCache ns 0.1

4.1 % 7.5 %

16.16%

1 OpenCL OpenCL 1 OpenCL GPU ( ) 1 OpenCL Compute Units Elements OpenCL OpenCL SPMD (Single-Program, Multiple-Data) SPMD OpenCL work-item work-group N

23 Fig. 2: hwmodulev2 3. Reconfigurable HPC 3.1 hw/sw hw/sw hw/sw FPGA PC FPGA PC FPGA HPC FPGA FPGA hw/sw hw/sw hw- Module FPGA hwmodule hw/sw FPGA h

M SRAM 1 25 ns ,000 DRAM ns ms 5,000,

GPGPU

先進的計算基盤システムシンポジウム SACSIS 2011 Symposium on Advanced Computing Systems and Infrastructures SACSIS /5/25 Combining Bimode Bimode-Plus Agree Hybr

26 FPGA FPGA (Field Programmable Gate Array) ASIC (Application Specific Integrated Circuit) FPGA FPGA FPGA FPGA Linux FreeDOS skewed way L1

58 10

,4) 1 P% P%P=2.5 5%!%! (1) = (2) l l Figure 1 A compilation flow of the proposing sampling based architecture simulation

17 Proposal of an Algorithm of Image Extraction and Research on Improvement of a Man-machine Interface of Food Intake Measuring System

Chip Size and Performance Evaluations of Shared Cache for On-chip Multiprocessor Takahiro SASAKI, Tomohiro INOUE, Nobuhiko OMORI, Tetsuo HIRONAKA, Han

Web Basic Web SAS-2 Web SAS-2 i

<95DB8C9288E397C389C88A E696E6462>

2017 (413812)

Microsoft PowerPoint MPSoC-KojiInoue-web.pptx

DRAM L2 L2 DRAM L2 DRAM L2 RAM DRAM 3 DRAM 3. 1 DRAM SRAM/DRAM 2. SRAM/DRAM DRAM LLC Last Level Cache 2 2) DRAM 1(A) (B) LLC L2 DRAM DRAM L2 SRAM DRAM

untitled

A Feasibility Study of Direct-Mapping-Type Parallel Processing Method to Solve Linear Equations in Load Flow Calculations Hiroaki Inayoshi, Non-member

LAN LAN LAN LAN LAN LAN,, i

JOURNAL OF THE JAPANESE ASSOCIATION FOR PETROLEUM TECHNOLOGY VOL. 66, NO. 6 (Nov., 2001) (Received August 10, 2001; accepted November 9, 2001) Alterna

n 2 n (Dynamic Programming : DP) (Genetic Algorithm : GA) 2 i

1 Table 1: Identification by color of voxel Voxel Mode of expression Nothing Other 1 Orange 2 Blue 3 Yellow 4 SSL Humanoid SSL-Vision 3 3 [, 21] 8 325

DTN DTN DTN DTN i

Studies of Foot Form for Footwear Design (Part 9) : Characteristics of the Foot Form of Young and Elder Women Based on their Sizes of Ball Joint Girth

IPSJ SIG Technical Report Vol.2014-CG-155 No /6/28 1,a) 1,2,3 1 3,4 CG An Interpolation Method of Different Flow Fields using Polar Inter

Sobel Canny i

28 Docker Design and Implementation of Program Evaluation System Using Docker Virtualized Environment

P2P Web Proxy P2P Web Proxy P2P P2P Web Proxy P2P Web Proxy Web P2P WebProxy i

258 5) GPS 1 GPS 6) GPS DP 7) 8) 10) GPS GPS ) GPS Global Positioning System

P2P P2P peer peer P2P peer P2P peer P2P i

<836F F312E706466>

2) TA Hercules CAA 5 [6], [7] CAA BOSS [8] 2. C II C. ( 1 ) C. ( 2 ). ( 3 ) 100. ( 4 ) () HTML NFS Hercules ( )

先端社会研究 ★5★号/4.山崎

IPSJ SIG Technical Report Vol.2015-HPC-150 No /8/6 I/O Jianwei Liao 1 Gerofi Balazs 1 1 Guo-Yuan Lien Prototyping F

,,,,., C Java,,.,,.,., ,,.,, i

soturon.dvi

29 jjencode JavaScript

単位、情報量、デジタルデータ、CPUと高速化 ~ICT用語集~

橡自動車~1.PDF

WebRTC P2P Web Proxy P2P Web Proxy WebRTC WebRTC Web, HTTP, WebRTC, P2P i

( ) [1] [4] ( ) 2. [5] [6] Piano Tutor[7] [1], [2], [8], [9] Radiobaton[10] Two Finger Piano[11] Coloring-in Piano[12] ism[13] MIDI MIDI 1 Fig. 1 Syst

, IT.,.,..,.. i

WikiWeb Wiki Web Wiki 2. Wiki 1 STAR WARS [3] Wiki Wiki Wiki 2 3 Wiki 5W1H Wiki Web 2.2 5W1H 5W1H 5W1H 5W1H 5W1H 5W1H 5W1H 2.3 Wiki 2015 Informa

IT i

IPSJ SIG Technical Report Vol.2017-ARC-225 No.12 Vol.2017-SLDM-179 No.12 Vol.2017-EMB-44 No /3/9 1 1 RTOS DefensiveZone DefensiveZone MPU RTOS

..,,,, , ( ) 3.,., 3.,., 500, 233.,, 3,,.,, i

7,, i

Virtual Window System Virtual Window System Virtual Window System Virtual Window System Virtual Window System Virtual Window System Social Networking

企業の信頼性を通じたブランド構築に関する考察

25 Removal of the fricative sounds that occur in the electronic stethoscope

12 DCT A Data-Driven Implementation of Shape Adaptive DCT

先進的計算基盤システムシンポジウム 2 : : TM TM 2.2 LogTM HTM LogTM TM LogTM LogTM LogTM read write read write LogTM Illinois 3 Read after Write (RaW): writ


Vol.54 No (July 2013) [9] [10] [11] [12], [13] 1 Fig. 1 Flowchart of the proposed system. c 2013 Information

28 Horizontal angle correction using straight line detection in an equirectangular image

3_39.dvi

Table 1. Assumed performance of a water electrol ysis plant. Fig. 1. Structure of a proposed power generation system utilizing waste heat from factori

Journal of Geography 116 (6) Configuration of Rapid Digital Mapping System Using Tablet PC and its Application to Obtaining Ground Truth

NotePC 8 10cd=m 2 965cd=m Note-PC Weber L,M,S { i {

Web Web Web Web i

日本看護管理学会誌15-2

Core Ethics Vol.

SOM SOM(Self-Organizing Maps) SOM SOM SOM SOM SOM SOM i

(SAD) x86 MPSADBW H.264/AVC H.264/AVC SAD SAD x86 SAD MPSADBW SAD 3x3 3 9 SAD SAD SAD x86 MPSADBW SAD 9 SAD SAD 4.6

OS Windows Vista Windows XP PowerPoint2003 Word2003 (a Test No. OS 1 Windows Vista PPT Windows Vista Word Windows XP PPT Windows XP

第 55 回自動制御連合講演会 2012 年 11 月 17 日,18 日京都大学 1K403 ( ) Interpolation for the Gas Source Detection using the Parameter Estimation in a Sensor Network S. T

PC PDA SMTP/POP3 1 POP3 SMTP MUA MUA MUA i

2 ( ) i

2007-Kanai-paper.dvi

kut-paper-template.dvi

IPSJ SIG Technical Report Vol.2011-MUS-91 No /7/ , 3 1 Design and Implementation on a System for Learning Songs by Presenting Musical St

MmUm+FopX m Mm+Mop F-Mm(Fop-Mopum)M m+mop MSuS+FX S M S+MOb Fs-Ms(Mobus-Fex)M s+mob Fig. 1 Particle model of single degree of freedom master/ slave sy

untitled

28 TCG SURF Card recognition using SURF in TCG play video

& Vol.5 No (Oct. 2015) TV 1,2,a) , Augmented TV TV AR Augmented Reality 3DCG TV Estimation of TV Screen Position and Ro


<4D F736F F D20955D985F81458ED089EF89C88A BA816A E82568C8E825093FA8C8E88B18DE28FE396EC92E78ADB8E5289FC>

20 No. 35 (2014) 2013 Excel Excel Excel Excel a 1


1 DHT Fig. 1 Example of DHT 2 Successor Fig. 2 Example of Successor 2.1 Distributed Hash Table key key value O(1) DHT DHT 1 DHT 1 ID key ID IP value D


16


IPSJ SIG Technical Report Vol.2012-CG-148 No /8/29 3DCG 1,a) On rigid body animation taking into account the 3D computer graphics came

23 Study on Generation of Sudoku Problems with Fewer Clues

07_伊藤由香_様.indd

THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE. dou

卒業論文2.dvi

Thesis.dvi


IT,, i

1 Fig. 1 Extraction of motion,.,,, 4,,, 3., 1, 2. 2.,. CHLAC,. 2.1,. (256 ).,., CHLAC. CHLAC, HLAC. 2.3 (HLAC ) r,.,. HLAC. N. 2 HLAC Fig. 2

テストコスト抑制のための技術課題-DFTとATEの観点から

II III I ~ 2 ~

中堅中小企業向け秘密保持マニュアル


PR映画-1

Transcription:

2013 (409812)

FabHetero FabHetero FabHetero FabCache FabCache SPEC2000INT 6 1000 IPC FabCache 0.076%

Abstract Single-ISA heterogeneous multi-core processors are increasing importance in the processor architecture. However, designing a single-isa heterogeneous multi-core requires design and verification effort which is multipliedby the number of different cores. processor. Therefore, FabHetero is proposed to solve this problem. FabHetero generates diverse heterogeneous multi-core processors automatically using FabScalar, FabCache, and FabBus. This paper proposes and implements FabCache which can automatically generate diverse cache systems of FabHetero. To indicate that the cache generated by FabCache works correctly, we execute 10 million instructions on SPEC2000INT benchmarks. we compared the area of auto-generated cache by FabCache with the area of hand designed cache. According to the evaluation results, caches generated by FabCache work correctly and occupy almost the same area compared with hand-designed cache.

1 1 2 3 3 5 3.1 FabScalar........................... 5 3.1.1.................... 6 4 FabHetero 8 4.1 FabBus............................. 9 5 FabCache 10 5.1................. 12 5.2.............................. 14 6 16 7 17 21 22 A 23 B 23 i

2.1... 3 3.2 2..... 7 4.3 FabHetero........................... 8 5.4............. 12 5.5................... 14 6.6 2.............. 17 6.7 Instruction Per Cycle1.................... 18 6.8 Instruction Per Cycle2.................... 18 6.9 Instruction Per Cycle4.................... 19 6.10 Instruction Per Cycle8.................... 19 ii

5.1 L1............... 11 6.2............... 17 2.3..................... 23 iii

1 RTL(Register Transfer Level) FabScalar[1] FabScalar FabScalar FabHetero[2] FabHetero 1

FabHetero FabCache 3 4 FabScalar 5 FabHetero 6 FabCache 7 2

2 Homogeneous Heterogeneous 2.1: CPU 1 ( 2.1) ( 2.1) 3

4

3 3.1 FabScalar FabScalar[2] FabScalar FabScalar FabScalar FabHetero 5

3.1.1 3.2 3.2 3.2 F D E M W 3.2 2 7 1 3 3 7 1 6 6 3.2 2 3 6

SinglePipeline 1 F D 2 3 SuperScalar 1 F D 2 F D 3 4 5 6 1 2 3 4 5 6 7 F D F D F D E M W F D E M W E M W E M W F D F D E M W E M W E M W E M W E M W 3.2: 2 7

4 FabHetero FabScalar FabCache L1 Inst Cache PU0 PU1 PU2 Implemented L1 Data Cache L1-I L1-D L1-I L1-D L2 L2-I L2-D FabBus Shared Memory 4.3: FabHetero FabHetero 4.3 FabHetero FabScalar FabCache FabBus FabScalar FabCache FabBus 8

FabHetero FabCache 4.1 FabBus FabBus 9

5 FabCache 4.3 4.3 L1 4.3 L1 L2 4.3 L1L2 L2 L2 FabCache FabCache L1 L1 5.1 Fetch Width 1 Cache Size Way Size Line Size 1 Word Size between L1 and L2 L1 L2 SRAM Access Latency SRAM 10

5.1 1 FabScalar FabCache 5.1: L1 PARAMETER RANGE Fetch Width 1 8 Cache Size 2 n Way Size 1 32 Line Size 1 8 Word Size between L1 and L2 1 LineSize SRAM Access Latency depend on SRAM module 11

5.1 Core Cache 1 FETCH_WIDTH18 2 3 4 5.4: FabScalar 1 5.4 1 5.5 5.5 FabScalar 4 12

a p 1 5.5 1 4 ( a b c d ) 1 1 1 c d e f 2 2 1 5.5 2 c d e f c d e f 1 13

Line0 Line1 Line2 Line3 Line4 Line5 Line6 Line7 1 c d e f a b c d e f g h i j k l m n o p Line0 Line2 c d e f a b c d i j k l Line1 Line3 e f g h m n o p Normal Memory Even Bank Odd Bank 5.5: 5.2 L1 System Verilog 2 3 4 5 7 8 6.6 6.6 5.5 3 d e f 14

4 c d e f d e f 2 15

6 FabCache SPEC2000INT bzip gzip gap mcf parser voretx 6 1 2 4 81 IPC) 6.7 6.10 6.7 6.10 IPC 128KB IPC 100% IPC FabCache IPC FabCache FabCache 6.2 0 076% FabCache 1 16

Line0 Line1 Line2 Line3 Line4 Line5 Line6 Line7 a b c d e f g h i j k l m n o p Line0 Line2 a b c d i j k l c d e c d e f Line1 Line3 e f g h m n o p Normal Memory Even Bank Odd Bank 6.6: 2 6.2: FabCache 819209 m 2 2.27ns 818590 m 2 2.43ns 7 FabCache 0 076% 5.1 17

1.4 1 FETCH WIDTH IPC 1.2 1 0.8 0.6 0.4 bzip gzip gap 0.2 mcf parser vortex 0 4 8 16 32 64 Cache Size(KB) Perfect 1.4 6.7: Instruction Per Cycle1 2 FETCH WIDTH 1.2 1 IPC 0.8 0.6 bzip 0.4 gzip gap mcf 0.2 parser vortex 0 4 8 16 32 64 Cache Size(KB) Perfect 6.8: Instruction Per Cycle2 18

1.4 4 FETCH WIDTH IPC 1.2 1 0.8 0.6 0.4 bzip gzip gap 0.2 mcf parser vortex 0 4 8 16 32 64 Cache Size(KB) Perfect 1.4 6.9: Instruction Per Cycle4 8 FETCH WIDTH IPC 1.2 1 0.8 0.6 0.4 bzip gzip gap 0.2 mcf parser vortex 0 4 8 16 32 64 Cache Size(KB) Perfect 6.10: Instruction Per Cycle8 19

L1 L2 20

21

[1] N. K. Choudhary, S. V. Wadhavkar, T. A. Shah, H. Mayukh, J. Gandhi, B. H. Dwiel, S. Navada, H. H. Najaf-abadi and E. Rotenberg. FabScalar: Composing Synthesizable RTL Designs of Arbitrary Cores within a Canonical Superscalar Template. Proceeding of the 38th IEEE/ACM Int l Symposium on Computer Architecture (ISCA-38), pp. 11-22, June 2011. [2],, Eric Rotenberg,,, FabScalar Alpha 21264, SACSIS2012. [3], AMBA,SWOPP2012. 22

A B PARAMETER RANGE FabCache SIZE ICACHE 1024 32768 SIZE ICACHE WAY 1 SIZE DCACHE 8192 SIZE DCACHE WAY 1 L2LATENCY 1 FabScalar FETCH WIDTH 1 8 2.3: 23