2013 (409812)
FabHetero FabHetero FabHetero FabCache FabCache SPEC2000INT 6 1000 IPC FabCache 0.076%
Abstract Single-ISA heterogeneous multi-core processors are increasing importance in the processor architecture. However, designing a single-isa heterogeneous multi-core requires design and verification effort which is multipliedby the number of different cores. processor. Therefore, FabHetero is proposed to solve this problem. FabHetero generates diverse heterogeneous multi-core processors automatically using FabScalar, FabCache, and FabBus. This paper proposes and implements FabCache which can automatically generate diverse cache systems of FabHetero. To indicate that the cache generated by FabCache works correctly, we execute 10 million instructions on SPEC2000INT benchmarks. we compared the area of auto-generated cache by FabCache with the area of hand designed cache. According to the evaluation results, caches generated by FabCache work correctly and occupy almost the same area compared with hand-designed cache.
1 1 2 3 3 5 3.1 FabScalar........................... 5 3.1.1.................... 6 4 FabHetero 8 4.1 FabBus............................. 9 5 FabCache 10 5.1................. 12 5.2.............................. 14 6 16 7 17 21 22 A 23 B 23 i
2.1... 3 3.2 2..... 7 4.3 FabHetero........................... 8 5.4............. 12 5.5................... 14 6.6 2.............. 17 6.7 Instruction Per Cycle1.................... 18 6.8 Instruction Per Cycle2.................... 18 6.9 Instruction Per Cycle4.................... 19 6.10 Instruction Per Cycle8.................... 19 ii
5.1 L1............... 11 6.2............... 17 2.3..................... 23 iii
1 RTL(Register Transfer Level) FabScalar[1] FabScalar FabScalar FabHetero[2] FabHetero 1
FabHetero FabCache 3 4 FabScalar 5 FabHetero 6 FabCache 7 2
2 Homogeneous Heterogeneous 2.1: CPU 1 ( 2.1) ( 2.1) 3
4
3 3.1 FabScalar FabScalar[2] FabScalar FabScalar FabScalar FabHetero 5
3.1.1 3.2 3.2 3.2 F D E M W 3.2 2 7 1 3 3 7 1 6 6 3.2 2 3 6
SinglePipeline 1 F D 2 3 SuperScalar 1 F D 2 F D 3 4 5 6 1 2 3 4 5 6 7 F D F D F D E M W F D E M W E M W E M W F D F D E M W E M W E M W E M W E M W 3.2: 2 7
4 FabHetero FabScalar FabCache L1 Inst Cache PU0 PU1 PU2 Implemented L1 Data Cache L1-I L1-D L1-I L1-D L2 L2-I L2-D FabBus Shared Memory 4.3: FabHetero FabHetero 4.3 FabHetero FabScalar FabCache FabBus FabScalar FabCache FabBus 8
FabHetero FabCache 4.1 FabBus FabBus 9
5 FabCache 4.3 4.3 L1 4.3 L1 L2 4.3 L1L2 L2 L2 FabCache FabCache L1 L1 5.1 Fetch Width 1 Cache Size Way Size Line Size 1 Word Size between L1 and L2 L1 L2 SRAM Access Latency SRAM 10
5.1 1 FabScalar FabCache 5.1: L1 PARAMETER RANGE Fetch Width 1 8 Cache Size 2 n Way Size 1 32 Line Size 1 8 Word Size between L1 and L2 1 LineSize SRAM Access Latency depend on SRAM module 11
5.1 Core Cache 1 FETCH_WIDTH18 2 3 4 5.4: FabScalar 1 5.4 1 5.5 5.5 FabScalar 4 12
a p 1 5.5 1 4 ( a b c d ) 1 1 1 c d e f 2 2 1 5.5 2 c d e f c d e f 1 13
Line0 Line1 Line2 Line3 Line4 Line5 Line6 Line7 1 c d e f a b c d e f g h i j k l m n o p Line0 Line2 c d e f a b c d i j k l Line1 Line3 e f g h m n o p Normal Memory Even Bank Odd Bank 5.5: 5.2 L1 System Verilog 2 3 4 5 7 8 6.6 6.6 5.5 3 d e f 14
4 c d e f d e f 2 15
6 FabCache SPEC2000INT bzip gzip gap mcf parser voretx 6 1 2 4 81 IPC) 6.7 6.10 6.7 6.10 IPC 128KB IPC 100% IPC FabCache IPC FabCache FabCache 6.2 0 076% FabCache 1 16
Line0 Line1 Line2 Line3 Line4 Line5 Line6 Line7 a b c d e f g h i j k l m n o p Line0 Line2 a b c d i j k l c d e c d e f Line1 Line3 e f g h m n o p Normal Memory Even Bank Odd Bank 6.6: 2 6.2: FabCache 819209 m 2 2.27ns 818590 m 2 2.43ns 7 FabCache 0 076% 5.1 17
1.4 1 FETCH WIDTH IPC 1.2 1 0.8 0.6 0.4 bzip gzip gap 0.2 mcf parser vortex 0 4 8 16 32 64 Cache Size(KB) Perfect 1.4 6.7: Instruction Per Cycle1 2 FETCH WIDTH 1.2 1 IPC 0.8 0.6 bzip 0.4 gzip gap mcf 0.2 parser vortex 0 4 8 16 32 64 Cache Size(KB) Perfect 6.8: Instruction Per Cycle2 18
1.4 4 FETCH WIDTH IPC 1.2 1 0.8 0.6 0.4 bzip gzip gap 0.2 mcf parser vortex 0 4 8 16 32 64 Cache Size(KB) Perfect 1.4 6.9: Instruction Per Cycle4 8 FETCH WIDTH IPC 1.2 1 0.8 0.6 0.4 bzip gzip gap 0.2 mcf parser vortex 0 4 8 16 32 64 Cache Size(KB) Perfect 6.10: Instruction Per Cycle8 19
L1 L2 20
21
[1] N. K. Choudhary, S. V. Wadhavkar, T. A. Shah, H. Mayukh, J. Gandhi, B. H. Dwiel, S. Navada, H. H. Najaf-abadi and E. Rotenberg. FabScalar: Composing Synthesizable RTL Designs of Arbitrary Cores within a Canonical Superscalar Template. Proceeding of the 38th IEEE/ACM Int l Symposium on Computer Architecture (ISCA-38), pp. 11-22, June 2011. [2],, Eric Rotenberg,,, FabScalar Alpha 21264, SACSIS2012. [3], AMBA,SWOPP2012. 22
A B PARAMETER RANGE FabCache SIZE ICACHE 1024 32768 SIZE ICACHE WAY 1 SIZE DCACHE 8192 SIZE DCACHE WAY 1 L2LATENCY 1 FabScalar FETCH WIDTH 1 8 2.3: 23