2017 (411824)
16.16%
Abstract Multi-core processor is common technique for high computing performance. In many multi-core processor architectures, all processors share L2 and last level cache memory. Thus, a performance of an entire multicore processor depends strongly on a performance of shared cache memory. In particular, miss rate of shared cache memory is one of the most important factor because every processor needs to wait for 100 to 1000 clock cycles when an access-miss occurs on shared cache memory. In addition, multi-core processor is a core and a program in which required data and allocated locations on the cache memory are different. Thus, in order to reduce the number of access misses, the temporal and spatial locality on shared cache memory, which is the most important concept of memory, is impaired. As one of these researches, there is a method called cache partitioning that allocates and restricts cache spaces where a core can access. Cache partitioning is one of memory access methods in set-associative caches, assigns ways accessible to each core, limits the location of data handled by each core can do. In addition, load of each core allocate according to dynamically controlling the number of way,it is possible to allocate an appropriate cache capacity to the task. However, since the allocation unit is a way, there may be ways that do not require all the cores in terms of memory requirements of tasks. As a previous study, there is a way allocation that makes unused ways unallocated. Way allocation reduces power consumption while suppressing performance degradation by setting unassigned ways in an inactive state that does not require electric power. However, way allocation and cache partitioning have the problem that shared data can not be handled. In addition, since assignment is based only on ways, allocation to each core is not optimal in many cases. Therefore, a cell allocation cache has been proposed in which shared data can be handled, and cells are allocated in units of finely divided ways, managed in finer areas and enhanced. However, there is a problem that unused cells continuously waste power in the cell allocation cache, so further improvements are still needed in terms of lower power consumption. In this paper, we propose a method to
add shutdown sleep function to cell allocation cache. Compared with the conventional cell allocation cache, the proposed method reduces 16.16% of power consumption on average. 4
1 1 2 4 2.1................ 4 2.2............. 6 2.3....................... 8 2.3.1................... 8 2.3.2................ 9 3 12 3.1........... 12 3.2..................... 14 3.3.................... 15 4 17 4.1........................ 17 4.2..................... 18 5 21 5.1............................ 21 5.2............................ 21 5.3............................ 25 5.4.............................. 27 6 28 30 31 i
2.1.............. 4 2.2....... 5 2.3.... 6 2.4................... 7 2.5................... 8 2.6.............. 9 2.7..................... 10 3.8......... 13 3.9........ 14 3.10.......... 16 4.11..................... 17 4.12................ 19 5.13.......................... 26 5.14.......................... 27 ii
5.1............................ 21 iii
1 ( ) 1
2
16.16% 3
2 2.1 2.1 Index core0 core1 Way0 Way1 Way2 Way3 0 1 2 3 254 255 : 2.1: A N A mod N 4k 16k N 4
2.2 2.3 Same Work Load Core0 Core1 Way0 Way1 Way2 Way3 2.2: 5
Light Work Load Heavy Work Load Core0 Core1 Way0 Way1 Way2 Way3 2.3: 2.2 0 1 2.3 1 2.2 2.4 2.4 6
Core0 Core1 Way0 Way1 Way2 Way3 :Active :Shutdown 2.4: 2.1 2.2 2.2 2.4 7
2.3 2.3.1 2.5 0 Core0 Core1 Way0 Way1 Way2 Way3 :read/write :write : Shared Data Main Memory 2.5: 2.5 2.5 2 2.3 8
0 1 2.5 1 3 2.5 2 0 2 2.3.2 2.1 Index core0 core1 Way0 Way1 Way2 Way3 0 1 2 3 254 255 : : Core0 Assigned Region : Core1 Assigned Region 2.6: 9
2.7 Index core0 core1 Way0 Way1 Way2 Way3 0 1 2 3 254 255 : : Core0 Require Region : Core1 Require Region 2.7: 2.6 0 1 2.7 2.6 10
11
3 3.1 [1] 12
3.8 core0 core1 Way0 Way1 Way2 Way3 : Cell(core0) : Cell(core1) 3.8: 13
3.2 3.3 3.2 3.9 Core0 Core1 Way0 Way1 Way2 Way3 :read/write :write : Shared Data Main Memory 3.9: 3.9 2 0 14
2 3.3 4 1 3.10 1 0 0 1 1 3.10 15
core0 core1 Way0 Way1 Way2 Way3 : Cell (core0) : Cell (core1) 3.10: 16
4 4.1 [2] 4.11 Mode1 Mode2 Mode3 4.11: 17
Mode1 1/4 2/4 3/4 Mode2 1/4 2/4 2/4 3/4 1/4 Mode3 Mode2 L3 Mode3 1/4 2/4 L3 L4 4.2 4.12 18
4.12: 0 1 0 1 0 19
1 Mode1 2 Core0 Core1 4.12 YES Core1 4.12 YES 1 20
5 5.1 Splash2[3] [4] 5.1 5.1: 2 L1 32kB L2 256kB 8 5.2 21
[5] [6] L2 2 E access 1 E one access N access E access = E one access N access (1) 1 1. 2. ( L3) 3. L3 2 22
N access E wakeup = E one wakeup N way N wakeup (2) E wakeup E one wakeup N way N wakeup SRAM E dynamic E dynamic = E access + E wakeup (3) N set Cycles CPU F req E static = E set leak N set N way Cycles F req (4) 23
1 E set leak E set leak = R sleep E sleep leak + (1 R sleep ) E active leak (5) R sleep E sleep leak E active leak E dynamic = E access E static = E active leak E P circuit = E controller N access + E buffer N wakeup (6) E P circuit L2 E controller 1 E buffer 1 24
Verilog HDL 0.1% L2 DRAM CACTI 10 100 1000 0.01% 5.3 5.13 16.16% 25
5.13: 5.14 15.12% 26
5.14: 5.4 16.16% fft radix 27
6 16.16% 15.12% 28
29
2 30
[1],, Vol.CPSY2016-20, pp.119 124 August 2016. [2] 2014 3 [3] S. C. Woo, et. al.: The SPLASH-2 programs: Characterization and methodological considerations, ACM SIGARCH Computer Architecture News. Vol. 23. No. 2. ACM, 1995. [4] <http://accc.riken.jp/supercom/himenobmt/> (2017 2 20 ) [5] 17, pp.235-240, April 2004. [6] Drowsy, 2006-ARC-170, 31
pp.37-41, December 2006. 32