FPGA 272 11 05340
26 FPGA 11 05340 1 FPGA (Field Programmable Gate Array) ASIC (Application Specific Integrated Circuit) FPGA FPGA FPGA FPGA Linux FreeDOS skewed way L1 FPGA skewed L2 FPGA skewed Linux skewed
1 1 1.1.......................... 1 1.2............................. 2 2 3 2.1.............................. 3 2.2........................... 4 2.3 skewed.......................... 8 2.4 FPGA..................... 10 3 14 3.1......................... 14 3.2........................... 15 3.3 skewed....................... 18 3.4 FPGA................ 19 4 21 4.1............................... 21 4.2.................... 22 4.3....................... 23 4.4............................... 24 4.5.................................. 26 5 31 32 32 i
1 1.1 1 FPGA (Field Programmable Gate Array) ASIC (Application Specific Integrated Circuit) FPGA FPGA FPGA FPGA Linux FreeDOS skewed way L1 FPGA skewed L2 Linux 1
1.2 2 skewed FPGA 3 4 5 2
2 2.1 2.1.1 IC Intel i4004[1] 2.1 Intel [2] Gulftown Core i7-980x 6 2.1: Intel 1971 1974 1978 1985 2000 2010 4004 8080 8086 80386 Pentium 4 core i7-980x (6 ) 4 8 16 32 32 64 () 2300 8500 3 28 4200 11 7000 100kHz 1MHz 10MHz 20MHz 1GHz 3GHz 4.5K 64K 16M 4G 4G 24G 3
2.1.2 Core i7 1 2 2.2 L1 ( 1) L2 ( 2) [3] Core i7-980x L1 L1 32KBL2 256KB 6 6 12MB L3 2.2.1 4
valid valid 1 2.1 1 2.2.2 2 dirty 5
2.1: 2.2.3 2.2 1 1 2 6
2.2: 2.3 2.4 2 n n-way 2 LRU (Least Recently Used) LRU LRU 7
2.3: 2.4: n-way 2.3 skewed skewed [4][5] 2.3.1 skewed 2-way 2.5 2-way A B 2 2-way 1 2 2-way skewed 2.6 8
d d 2.5: 2-way 2.6: 2-way skewed skewed A B way 2-way skewed 1 4 2.3.2 skewed 2 n 1 4-way A A A 0 A 3 4 A = {A 3 A 2 A 1 A 0 } A 0 2 A 1 A 2 n A 3 A 1 {A 3 A 2 } skewed A 1 A 2 σϕϕ 0 3 index 0 index 1 index 2 index 3 9
index 0 = {ϕ(a 1 ) ϕ (A 2 )} index 1 = {σ(ϕ(a 1 )) ϕ (A 2 )} index 2 = {σ 2 (ϕ(a 1 )) ϕ (A 2 )} index 3 = {σ 3 (ϕ(a 1 )) ϕ (A 2 )} σ ϕ ϕ n ϕ ϕ σ n σ 2 n 2 σ 3 n 3 2 4 1 4-way A = 1010101101100 A 0 A 3 A 0 = {00}A 1 = {1011}A 2 = {0101}A 3 = {101} σ 1 σ 2 2 σ 3 3 ϕ ϕ index 0 index 1 index 2 index 3 index 0 = {A 1 A 2 } = {1110} index 1 = {σ(a 1 ) A 2 } = {0010} index 2 = {σ 2 (A 1 ) A 2 } = {1011} index 3 = {σ 3 (A 1 ) A 2 } = {1000} 4 way skewed 2.4 FPGA [6] ISA x86 OS FPGA Altera DE2-115 FPGA Linux 10
2.7: ao486 SoC (Tiny Core 5.3) FreeDOS 1.1 2.4.1 ao486 SoC x86 OS FPGA ao486[7] ao486 Verilog HDL Intel 80486SX ao486 VGA PS2 ao486 SoC 2.7 ao486 SoC ao486 SoC Terasic Altera DE2-115 FPGA Linux 3.13 Windows95 2.4.2 ao486 SoC ao486 SoC Altera Nios II Nios II BIOS Nios II BIOS Verilog HDL FPGA 2.8 11
2.8: FPGA ao486 16KB I/O DRAMVGAPS/2SD Card 4 PIT (Programmable Interval Timer)RTC (Real Time Clock)PIC (Programmable Interrupt Controller)HDDBIOS loader FPGA HDD SD contoller SD SD OS BIOS BIOS loader SD controller SD BIOS DRAM BIOS loader ao486 reset high BIOS BIOS loader ao486 reset low SD OS OS 2.9 Tiny Core 5.3 2.10 FreeDOS 1.1 DOOM[8] 12
2.9: FPGA Linux (Tiny Core 5.3) 2.10: FPGA FreeDOS 1.1 DOOM 13
3 3.1 2.4 FPGA Cyclone IV EP4CE115F29C7 Altera DE2-115 FPGA [10] Linux (Tiny Core 5.3) SPEC CPU92 [9] 3 1024 1024 256 256 12 C Intel 80486 GCC DRAM read write 3.1 sort mm go L1 DRAM read/write L1 read write 14
dddl ddl edl yœµµ dl µœ µœ fœe 3.1: 30% FPGA L2 L2 3.2 3.1 FPGA L2 3.1 8 Verilog HDL RAM 3 RAM RAM 15
3.1: skewed 1word Direct mapped Direct mapped 1 word 2 14 no 4word Direct mapped Direct mapped 4 word 2 12 no 1word 2-way set associative 2-way set associative 1 word 2 13 no 4word 2-way set associative 2-way set associative 4 word 2 11 no 4word 2-way skewed associative 2-way set associative 4 word 2 11 yes 1word 4-way set associative 4-way set associative 1 word 2 12 no 4word 4-way set associative 4-way set associative 4 word 2 10 no 4word 4-way skewed associative 4-way set associative 4 word 2 10 yes DRAM 3.2.1 3.2 IDLE read/write IDLE read/write COMP COMP hit/modify/miss 3 hit hit hit read/write 1 IDLE modify modify WB DRAM FETCH 16
3.2: L2 miss miss miss miss DRAM FETCH 1word write 32bit writedata 4bit byte enable byte enable 1word 1 2 write 4 write writedata modify miss FETCH IDLE 4word 4 FETCH 4word 4word DRAM 4word DRAM DRAM 4word 1word 17
3.3: LRU Control Memory 3.2.2 LRU 2-way 4-way LRU LRU 2-way 1bit 4-way 8bit LRU Control Memory 4 3.3 LRU Control Memory LRU LRU Control Memory 2bit 0123 bit bit 0 LRU Control Memory {00} 0 LRU Control Memory 2bit 3.3 skewed 4word 2-way 4-way skewed 2.3 A A = {A 3 A 2 A 1 A 0 } index 0 18
index 3 index 0 = {ϕ(a 1 ) ϕ (A 2 )} index 1 = {σ(ϕ(a 1 )) ϕ (A 2 )} index 2 = {σ 2 (ϕ(a 1 )) ϕ (A 2 )} index 3 = {σ 3 (ϕ(a 1 )) ϕ (A 2 )} σ 1bit ϕ ϕ 1 σ 2 2bit σ 3 3bit index 1 A 1 index 1 = {σ(a 1 ) A 2 } A 1 = {σ 1 (index 1 A 2 )} {A 3 A 2 } 1 3.4 FPGA L2 2.4 FPGA DRAM L2 3.4 SD BIOS DRAM SD controller write read/write L2 3.5 readdata writedata 32bit read/write 1 4 0x00 4 read 0x000x040x08 0x0C 4 read Burst Controller 32bit 3.2.1 19
3.4: L2 FPGA 3.5: L2 L2 64KB Cyclone IV EP4CE115F29C7 Altera DE2-115 FPGA Cyclone IV EP4CE115F29C7 20
4 4.1 4.1.1 Cyclone IV EP4CE115F29C7 Altera DE2-115 FPGA 3 FPGA Linux (Tiny Core 5.3) 4.1 Altera DE2-115 FPGA GPIO 34 UART PC PC TeraTerm FPGA read/write # PC # VGA writedata # 4.1.2 3.1 3 1024 1024 Xorshift 21
4.1: 256 256 Xorshift 2 12 OS 3 C Intel 80486 GCC Tiny Core FPGA Tiny Core 4.2 4.1 Cyclone IV EP4CE115F29C7 ao486 processor LC Util per CPU ao486 processor Logic Cells 64KB 22
4.1: cache Logic Cells Memory Bits LC Util per CPU 1word Direct mapped 510 737,280 1.4% 4word Direct mapped 632 577,536 1.6% 1word 2-way set associative 589 761,865 1.6% 4word 2-way set associative 815 583,680 2.3% 4word 2-way skewed associative 846 583,680 2.3% 1word 4-way set associative 841 802,816 2.3% 4word 4-way set associative 1,183 593,920 3.3% 4word 4-way skewed associative 1,284 593,920 3.6% ao486 processor 36,159 310,784 100% 2-way 4-way LRU LRU Control Memory skewed skewed 4.3 3 read/write 4.2 L2 mm sort go sort&mm mm&go go&sort sort&mm&go 3 2-way 4-way 1word 23
ddd dd d dddd d dd dd 4.2: L2 4word skewed skewed 2 3 1word 42.5% 4word 4-way skewed 6.51% 4.4 L2 3 24
4.3: L2 4.3 L2 L2 L2 4word 4-way skewed sort 1.55 1.46 1.37 3 1.34 1word L2 skewed skewed 2-way 4-way 1word 4word 25
skewed skewed 2-way 2-way skewed 4-way 4-way skewed 3 1word 171.3(sec) 4word 4-way skewed 136.9(sec) 1.25 4.5 4.5.1 1word 4word DRAM read/write 4word 4word DRAM read/write read/write 4word read 4 read 4word 4 read 3 1word 26
4.2: app read request write request (million) (million) sort 54.6 16.4 mm 101.8 70.7 go 156.4 25.3 4.5.2 4.2 read/write 1word OS read/write 4.2 read/write skewed int xorshift a b b 4.4 256 256 b[0] 0x000 b[1]b[2] b b[0] b[256] 0x400 skewed 27
4.4: 3 4.5 3 4.5.3 skewed skewed 4.6 4.7 2-way skewed 4-way 28
ddd ded dd ed d d d d d d d d d d d d d d d 4.5: 3 skewed 4-way 2-way skewed 2-way skewed 4-way 2-way skewed LRU Control Memory 4-way 2-way 29
ddd ded dd ed d dd dd 4.6: 2-way skewed 4-way ddd ddd dd dd dd ed ed dd dd 4.7: 2-way skewed 4-way 30
5 OS FPGA FPGA Linux skewed skewed 3 4word 4-way skewed 1word 1.25 L2 1.34 skewed FPGA skewed 31
Thiem Van Chu 32
[1] Intel... http://japan.intel.com/contents/museum/hof/index.html [2].. 2,, 2011, 13p [3] David A.Patterson/John L.Hennessy. &.. 4, BP, 2011 [4] André Seznec. A Case for Two-Way Skewed-Associative Caches. ISCA 93 Proceedings of the 20th annual international symposium on computer architecture, pp.169 178. 1993 [5] André Seznec. A New Case for Skewed-Associativity. IRISA-INRIA, Campus de Beaulieu 35042 Rennes Cedex, FRANCE, 1997 [6],,,,,. MieruSys FPGA. RECONF CPSY VLD IPSJ-SLDM,, vol. 114, no. 428, RECONF2014-79, pp. 211-216, 2015 [7] ao486. https://github.com/alfikpl/ao486 [8] idsoftware. http://www.idsoftware.com/en-gb/ [9] spec. SPEC CPU92 Benchmarks. Standard Performance Evaluation Corporation. https://www.spec.org/cpu92/ [10] ALTERA. DE2-115 Development and Education Board. http://www.altera.co.jp/education/univ/materials/boards/de2-115/unv-de2-115-board.html 33