16.16%

Similar documents
4.1 % 7.5 %

FabHetero FabHetero FabHetero FabCache FabCache SPEC2000INT IPC FabCache 0.076%

GPGPU

2017 (413812)

P2P P2P peer peer P2P peer P2P peer P2P i

Bull. of Nippon Sport Sci. Univ. 47 (1) Devising musical expression in teaching methods for elementary music An attempt at shared teaching

EQUIVALENT TRANSFORMATION TECHNIQUE FOR ISLANDING DETECTION METHODS OF SYNCHRONOUS GENERATOR -REACTIVE POWER PERTURBATION METHODS USING AVR OR SVC- Ju

DRAM L2 L2 DRAM L2 DRAM L2 RAM DRAM 3 DRAM 3. 1 DRAM SRAM/DRAM 2. SRAM/DRAM DRAM LLC Last Level Cache 2 2) DRAM 1(A) (B) LLC L2 DRAM DRAM L2 SRAM DRAM

29 jjencode JavaScript

soturon.dvi

,4) 1 P% P%P=2.5 5%!%! (1) = (2) l l Figure 1 A compilation flow of the proposing sampling based architecture simulation

1 1 tf-idf tf-idf i

WASEDA RILAS JOURNAL

Web Basic Web SAS-2 Web SAS-2 i

07_伊藤由香_様.indd

28 Docker Design and Implementation of Program Evaluation System Using Docker Virtualized Environment

n 2 n (Dynamic Programming : DP) (Genetic Algorithm : GA) 2 i

_’¼Œì

Virtual Window System Virtual Window System Virtual Window System Virtual Window System Virtual Window System Virtual Window System Social Networking

千葉県における温泉地の地域的展開

24 Depth scaling of binocular stereopsis by observer s own movements

58 10

( ) [1] [4] ( ) 2. [5] [6] Piano Tutor[7] [1], [2], [8], [9] Radiobaton[10] Two Finger Piano[11] Coloring-in Piano[12] ism[13] MIDI MIDI 1 Fig. 1 Syst

20 Method for Recognizing Expression Considering Fuzzy Based on Optical Flow

untitled

LAN LAN LAN LAN LAN LAN,, i

Table 1. Assumed performance of a water electrol ysis plant. Fig. 1. Structure of a proposed power generation system utilizing waste heat from factori

揃 Lag [hour] Lag [day] 35

28 Horizontal angle correction using straight line detection in an equirectangular image

Sobel Canny i

i

IPSJ SIG Technical Report Secret Tap Secret Tap Secret Flick 1 An Examination of Icon-based User Authentication Method Using Flick Input for

早稲田大学現代政治経済研究所 ダブルトラック オークションの実験研究 宇都伸之早稲田大学上條良夫高知工科大学船木由喜彦早稲田大学 No.J1401 Working Paper Series Institute for Research in Contemporary Political and Ec

2 122

Core1 FabScalar VerilogHDL Cache Cache FabScalar 1 CoreConnect[2] Wishbone[3] AMBA[4] AMBA 1 AMBA ARM L2 AMBA2.0 AMBA2.0 FabScalar AHB APB AHB AMBA2.0

橡自動車~1.PDF

16_.....E...._.I.v2006

<95DB8C9288E397C389C88A E696E6462>

ABSTRACT The movement to increase the adult literacy rate in Nepal has been growing since democratization in In recent years, about 300,000 peop


DOUSHISYA-sports_R12339(高解像度).pdf

Thesis.dvi

1 [1, 2, 3, 4, 5, 8, 9, 10, 12, 15] The Boston Public Schools system, BPS (Deferred Acceptance system, DA) (Top Trading Cycles system, TTC) cf. [13] [

II

,,.,.,,.,.,.,.,,.,..,,,, i

先端社会研究 ★5★号/4.山崎

Microsoft Word - ??? ????????? ????? 2013.docx

Vol.-ARC-8 No.8 Vol.-OS- No.8 // DRAM DRAM DRAM DRAM ) DRAM. DRAM. ) DRAM DRAM DRAM DRAM DRAM SRAM DRAM MB B MB DRAM SRAM.. DRAM DRAM SRAM DRAM SRAM C

2 ( ) i

23 Fig. 2: hwmodulev2 3. Reconfigurable HPC 3.1 hw/sw hw/sw hw/sw FPGA PC FPGA PC FPGA HPC FPGA FPGA hw/sw hw/sw hw- Module FPGA hwmodule hw/sw FPGA h


DTN DTN DTN DTN i

Web Web Web Web Web, i

untitled

THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE.

IPSJ SIG Technical Report Vol.2016-CE-137 No /12/ e β /α α β β / α A judgment method of difficulty of task for a learner using simple

10-渡部芳栄.indd

..,,,, , ( ) 3.,., 3.,., 500, 233.,, 3,,.,, i

29 Short-time prediction of time series data for binary option trade

27 VR Effects of the position of viewpoint on self body in VR environment

人文地理62巻4号

25 Removal of the fricative sounds that occur in the electronic stethoscope

WebRTC P2P Web Proxy P2P Web Proxy WebRTC WebRTC Web, HTTP, WebRTC, P2P i

16

2

) 6) 2 (1855) 10 (1921) 7) II 8) 75 9)

SPSS

A Feasibility Study of Direct-Mapping-Type Parallel Processing Method to Solve Linear Equations in Load Flow Calculations Hiroaki Inayoshi, Non-member

Microsoft PowerPoint MPSoC-KojiInoue-web.pptx

1., 1 COOKPAD 2, Web.,,,,,,.,, [1]., 5.,, [2].,,.,.,, 5, [3].,,,.,, [4], 33,.,,.,,.. 2.,, 3.., 4., 5., ,. 1.,,., 2.,. 1,,

24_ChenGuang_final.indd


_Y05…X…`…‘…“†[…h…•

PC PDA SMTP/POP3 1 POP3 SMTP MUA MUA MUA i

Kyushu Communication Studies 第2号

Oda


九州大学学術情報リポジトリ Kyushu University Institutional Repository 看護師の勤務体制による睡眠実態についての調査 岩下, 智香九州大学医学部保健学科看護学専攻 出版情報 : 九州大学医学部保健学

23 Study on Generation of Sudoku Problems with Fewer Clues

25 II :30 16:00 (1),. Do not open this problem booklet until the start of the examination is announced. (2) 3.. Answer the following 3 proble

井手友里子.indd


情報処理学会研究報告 IPSJ SIG Technical Report Vol.2013-CVIM-186 No /3/15 EMD 1,a) SIFT. SIFT Bag-of-keypoints. SIFT SIFT.. Earth Mover s Distance

知能と情報, Vol.30, No.5, pp

A5 PDF.pwd

2 except for a female subordinate in work. Using personal name with SAN/KUN will make the distance with speech partner closer than using titles. Last


NEXT FUNDS NASDAQ-100 連動型上場投信

NEXT FUNDS NASDAQ-100 連動型上場投信



Core Ethics Vol. QOL N N N N N N N K N N

Web Stamps 96 KJ Stamps Web Vol 8, No 1, 2004


Visual Evaluation of Polka-dot Patterns Yoojin LEE and Nobuko NARUSE * Granduate School of Bunka Women's University, and * Faculty of Fashion Science,

(1) i NGO ii (2) 112

I N S T R U M E N T A T I O N & E L E C T R I C A L E Q U I P M E N T STW Symbol Symbol otary switch) 05 otary switch Symbol angle of notch 181

DEIM Forum 2009 B4-6, Str

TCP/IP IEEE Bluetooth LAN TCP TCP BEC FEC M T M R M T 2. 2 [5] AODV [4]DSR [3] 1 MS 100m 5 /100m 2 MD 2 c 2009 Information Processing Society of


Transcription:

2017 (411824)

16.16%

Abstract Multi-core processor is common technique for high computing performance. In many multi-core processor architectures, all processors share L2 and last level cache memory. Thus, a performance of an entire multicore processor depends strongly on a performance of shared cache memory. In particular, miss rate of shared cache memory is one of the most important factor because every processor needs to wait for 100 to 1000 clock cycles when an access-miss occurs on shared cache memory. In addition, multi-core processor is a core and a program in which required data and allocated locations on the cache memory are different. Thus, in order to reduce the number of access misses, the temporal and spatial locality on shared cache memory, which is the most important concept of memory, is impaired. As one of these researches, there is a method called cache partitioning that allocates and restricts cache spaces where a core can access. Cache partitioning is one of memory access methods in set-associative caches, assigns ways accessible to each core, limits the location of data handled by each core can do. In addition, load of each core allocate according to dynamically controlling the number of way,it is possible to allocate an appropriate cache capacity to the task. However, since the allocation unit is a way, there may be ways that do not require all the cores in terms of memory requirements of tasks. As a previous study, there is a way allocation that makes unused ways unallocated. Way allocation reduces power consumption while suppressing performance degradation by setting unassigned ways in an inactive state that does not require electric power. However, way allocation and cache partitioning have the problem that shared data can not be handled. In addition, since assignment is based only on ways, allocation to each core is not optimal in many cases. Therefore, a cell allocation cache has been proposed in which shared data can be handled, and cells are allocated in units of finely divided ways, managed in finer areas and enhanced. However, there is a problem that unused cells continuously waste power in the cell allocation cache, so further improvements are still needed in terms of lower power consumption. In this paper, we propose a method to

add shutdown sleep function to cell allocation cache. Compared with the conventional cell allocation cache, the proposed method reduces 16.16% of power consumption on average. 4

1 1 2 4 2.1................ 4 2.2............. 6 2.3....................... 8 2.3.1................... 8 2.3.2................ 9 3 12 3.1........... 12 3.2..................... 14 3.3.................... 15 4 17 4.1........................ 17 4.2..................... 18 5 21 5.1............................ 21 5.2............................ 21 5.3............................ 25 5.4.............................. 27 6 28 30 31 i

2.1.............. 4 2.2....... 5 2.3.... 6 2.4................... 7 2.5................... 8 2.6.............. 9 2.7..................... 10 3.8......... 13 3.9........ 14 3.10.......... 16 4.11..................... 17 4.12................ 19 5.13.......................... 26 5.14.......................... 27 ii

5.1............................ 21 iii

1 ( ) 1

2

16.16% 3

2 2.1 2.1 Index core0 core1 Way0 Way1 Way2 Way3 0 1 2 3 254 255 : 2.1: A N A mod N 4k 16k N 4

2.2 2.3 Same Work Load Core0 Core1 Way0 Way1 Way2 Way3 2.2: 5

Light Work Load Heavy Work Load Core0 Core1 Way0 Way1 Way2 Way3 2.3: 2.2 0 1 2.3 1 2.2 2.4 2.4 6

Core0 Core1 Way0 Way1 Way2 Way3 :Active :Shutdown 2.4: 2.1 2.2 2.2 2.4 7

2.3 2.3.1 2.5 0 Core0 Core1 Way0 Way1 Way2 Way3 :read/write :write : Shared Data Main Memory 2.5: 2.5 2.5 2 2.3 8

0 1 2.5 1 3 2.5 2 0 2 2.3.2 2.1 Index core0 core1 Way0 Way1 Way2 Way3 0 1 2 3 254 255 : : Core0 Assigned Region : Core1 Assigned Region 2.6: 9

2.7 Index core0 core1 Way0 Way1 Way2 Way3 0 1 2 3 254 255 : : Core0 Require Region : Core1 Require Region 2.7: 2.6 0 1 2.7 2.6 10

11

3 3.1 [1] 12

3.8 core0 core1 Way0 Way1 Way2 Way3 : Cell(core0) : Cell(core1) 3.8: 13

3.2 3.3 3.2 3.9 Core0 Core1 Way0 Way1 Way2 Way3 :read/write :write : Shared Data Main Memory 3.9: 3.9 2 0 14

2 3.3 4 1 3.10 1 0 0 1 1 3.10 15

core0 core1 Way0 Way1 Way2 Way3 : Cell (core0) : Cell (core1) 3.10: 16

4 4.1 [2] 4.11 Mode1 Mode2 Mode3 4.11: 17

Mode1 1/4 2/4 3/4 Mode2 1/4 2/4 2/4 3/4 1/4 Mode3 Mode2 L3 Mode3 1/4 2/4 L3 L4 4.2 4.12 18

4.12: 0 1 0 1 0 19

1 Mode1 2 Core0 Core1 4.12 YES Core1 4.12 YES 1 20

5 5.1 Splash2[3] [4] 5.1 5.1: 2 L1 32kB L2 256kB 8 5.2 21

[5] [6] L2 2 E access 1 E one access N access E access = E one access N access (1) 1 1. 2. ( L3) 3. L3 2 22

N access E wakeup = E one wakeup N way N wakeup (2) E wakeup E one wakeup N way N wakeup SRAM E dynamic E dynamic = E access + E wakeup (3) N set Cycles CPU F req E static = E set leak N set N way Cycles F req (4) 23

1 E set leak E set leak = R sleep E sleep leak + (1 R sleep ) E active leak (5) R sleep E sleep leak E active leak E dynamic = E access E static = E active leak E P circuit = E controller N access + E buffer N wakeup (6) E P circuit L2 E controller 1 E buffer 1 24

Verilog HDL 0.1% L2 DRAM CACTI 10 100 1000 0.01% 5.3 5.13 16.16% 25

5.13: 5.14 15.12% 26

5.14: 5.4 16.16% fft radix 27

6 16.16% 15.12% 28

29

2 30

[1],, Vol.CPSY2016-20, pp.119 124 August 2016. [2] 2014 3 [3] S. C. Woo, et. al.: The SPLASH-2 programs: Characterization and methodological considerations, ACM SIGARCH Computer Architecture News. Vol. 23. No. 2. ACM, 1995. [4] <http://accc.riken.jp/supercom/himenobmt/> (2017 2 20 ) [5] 17, pp.235-240, April 2004. [6] Drowsy, 2006-ARC-170, 31

pp.37-41, December 2006. 32