/ FPGA LSI [1] CDP DDP 2 LSI FPGA PicoProcessor(pP)[2] (STP)[1] DDP 1.27 i



Similar documents
BIST LSI LSI LSI (DDP) BIST Ring-STP (BIST) BIST LSI e-shuttle 65nm 12Layer CMOS Cadence Verilog-XL 100MHz 16M Packet/sec LSI 5 1 BIST i

Microsoft PowerPoint - NxLecture ppt [互換モード]

12 DCT A Data-Driven Implementation of Shape Adaptive DCT

Microsoft PowerPoint - NxLec ppt

.,. 0. (MSB). =2, =1/2.,. MSB LSB, LSB MSB. MSB 0 LSB 0 0 P

28 Docker Design and Implementation of Program Evaluation System Using Docker Virtualized Environment

soturon.dvi

IPSJ SIG Technical Report Vol.2017-ARC-225 No.12 Vol.2017-SLDM-179 No.12 Vol.2017-EMB-44 No /3/9 1 1 RTOS DefensiveZone DefensiveZone MPU RTOS

3 SIMPLE ver 3.2: SIMPLE (SIxteen-bit MicroProcessor for Laboratory Experiment) 1 16 SIMPLE SIMPLE 2 SIMPLE 2.1 SIMPLE (main memo

P2P P2P peer peer P2P peer P2P peer P2P i

「FPGAを用いたプロセッサ検証システムの製作」

PC PDA SMTP/POP3 1 POP3 SMTP MUA MUA MUA i

16.16%

7,, i

if clear = 1 then Q <= " "; elsif we = 1 then Q <= D; end rtl; regs.vhdl clk 0 1 rst clear we Write Enable we 1 we 0 if clk 1 Q if rst =

21 Quantum calculator simulator based on reversible operation

23 A Comparison of Flick and Ring Document Scrolling in Touch-based Mobile Phones

Microsoft PowerPoint - Lec pptx

2017 (413812)

n 2 n (Dynamic Programming : DP) (Genetic Algorithm : GA) 2 i

プロセッサ・アーキテクチャ

VHDL VHDL VHDL i

paper.dvi

2005 1

24 FFT Self-Timeed Pipeline Implementation of Adaptive FFT for Different Rate Signals

23 Fig. 2: hwmodulev2 3. Reconfigurable HPC 3.1 hw/sw hw/sw hw/sw FPGA PC FPGA PC FPGA HPC FPGA FPGA hw/sw hw/sw hw- Module FPGA hwmodule hw/sw FPGA h

単位、情報量、デジタルデータ、CPUと高速化 ~ICT用語集~

6. パイプライン制御

14 CRT Color Constancy in the Conditions of Dierent Cone Adaptation in a CRT Display

,,,,., C Java,,.,,.,., ,,.,, i


Web Web Web Web Web, i

26 FPGA FPGA (Field Programmable Gate Array) ASIC (Application Specific Integrated Circuit) FPGA FPGA FPGA FPGA Linux FreeDOS skewed way L1

23 The Study of support narrowing down goods on electronic commerce sites

I117 II I117 PROGRAMMING PRACTICE II DEBUG Research Center for Advanced Computing Infrastructure (RCACI) / Yasuhiro Ohara

FabHetero FabHetero FabHetero FabCache FabCache SPEC2000INT IPC FabCache 0.076%

4.1 % 7.5 %

20 Method for Recognizing Expression Considering Fuzzy Based on Optical Flow

VHDL

1 1 tf-idf tf-idf i

Virtual Window System Virtual Window System Virtual Window System Virtual Window System Virtual Window System Virtual Window System Social Networking

, IT.,.,..,.. i

Design at a higher level

IT i

,,.,,., II,,,.,,.,.,,,.,,,.,, II i


..,,,, , ( ) 3.,., 3.,., 500, 233.,, 3,,.,, i

25 Removal of the fricative sounds that occur in the electronic stethoscope

A Feasibility Study of Direct-Mapping-Type Parallel Processing Method to Solve Linear Equations in Load Flow Calculations Hiroaki Inayoshi, Non-member

Microsoft PowerPoint - Lecture ppt [互換モード]

Firewall IDS IP IP 1 HTTP 74% Quick Search 32 bit DDMP Mbps URL Filtering 59.3 Mbps i

untitled

デザインパフォーマンス向上のためのHDLコーディング法

2007-Kanai-paper.dvi

スライド 1

WebRTC P2P,. Web,. WebRTC. WebRTC, P2P, i

29 jjencode JavaScript

News_Letter_No35(Ver.2).p65

IT,, i

スライド 1

21 Effects of background stimuli by changing speed color matching color stimulus

AccessflÌfl—−ÇŠš1

NotePC 8 10cd=m 2 965cd=m Note-PC Weber L,M,S { i {

IPSJ SIG Technical Report Vol.2016-CE-137 No /12/ e β /α α β β / α A judgment method of difficulty of task for a learner using simple

2 10 The Bulletin of Meiji University of Integrative Medicine 1,2 II 1 Web PubMed elbow pain baseball elbow little leaguer s elbow acupun

地域共同体を基盤とした渇水管理システムの持続可能性

17 Proposal of an Algorithm of Image Extraction and Research on Improvement of a Man-machine Interface of Food Intake Measuring System

i


Wide Scanner TWAIN Source ユーザーズガイド

24 Depth scaling of binocular stereopsis by observer s own movements

kiyo5_1-masuzawa.indd

MRI | 所報 | 分権経営の進展下におけるグループ・マネジメント

Fig. 1 Schematic construction of a PWS vehicle Fig. 2 Main power circuit of an inverter system for two motors drive

橡自動車~1.PDF

main.dvi

<95DB8C9288E397C389C88A E696E6462>

29 Short-time prediction of time series data for binary option trade

( )


Wi-Fi Wi-Fi Wi-Fi Wi-Fi SAS SAS-2 Wi-Fi i

1 Fig. 1 Extraction of motion,.,,, 4,,, 3., 1, 2. 2.,. CHLAC,. 2.1,. (256 ).,., CHLAC. CHLAC, HLAC. 2.3 (HLAC ) r,.,. HLAC. N. 2 HLAC Fig. 2

kut-paper-template.dvi

Abstract This paper concerns with a method of dynamic image cognition. Our image cognition method has two distinguished features. One is that the imag

IPSJ SIG Technical Report Vol.2013-ARC-203 No /2/1 SMYLE OpenCL (NEDO) IT FPGA SMYLEref SMYLE OpenCL SMYLE OpenCL FPGA 1

EQUIVALENT TRANSFORMATION TECHNIQUE FOR ISLANDING DETECTION METHODS OF SYNCHRONOUS GENERATOR -REACTIVE POWER PERTURBATION METHODS USING AVR OR SVC- Ju

2

Vol.55 No (Jan. 2014) saccess 6 saccess 7 saccess 2. [3] p.33 * B (A) (B) (C) (D) (E) (F) *1 [3], [4] Web PDF a m

Web Basic Web SAS-2 Web SAS-2 i

Core1 FabScalar VerilogHDL Cache Cache FabScalar 1 CoreConnect[2] Wishbone[3] AMBA[4] AMBA 1 AMBA ARM L2 AMBA2.0 AMBA2.0 FabScalar AHB APB AHB AMBA2.0

RW1097-0A-001_V0.1_170106

SNS ( ) SNS(Social Networking Service) SNS SNS i

VLSI工学

job-shop.dvi

TCP/IP IEEE Bluetooth LAN TCP TCP BEC FEC M T M R M T 2. 2 [5] AODV [4]DSR [3] 1 MS 100m 5 /100m 2 MD 2 c 2009 Information Processing Society of

untitled

( ) [1] [4] ( ) 2. [5] [6] Piano Tutor[7] [1], [2], [8], [9] Radiobaton[10] Two Finger Piano[11] Coloring-in Piano[12] ism[13] MIDI MIDI 1 Fig. 1 Syst

20 No. 35 (2014) 2013 Excel Excel Excel Excel a 1

0630-j.ppt

DRAM SRAM SDRAM (Synchronous DRAM) DDR SDRAM (Double Data Rate SDRAM) DRAM 4 C Wikipedia 1.8 SRAM DRAM DRAM SRAM DRAM SRAM (256M 1G bit) (32 64M bit)

Transcription:

22 / FPGA A Study of FPGA Platform for Architecture Evaluation of a Data-Driven/Control-Driven Processor 1110232

/ FPGA LSI [1] CDP DDP 2 LSI FPGA PicoProcessor(pP)[2] (STP)[1] DDP 1.27 i

Abstract A Study of FPGA Platform for Architecture Evaluation of a Data-Driven/Control-Driven Processor Hajime OOISO To improve the performance of LSI s only by increasing clock frequency cannot meet to market requirements due to increasing power consumption. Now, in order to further improve the processing performance, LSI must introduce parallel processing scheme achieving higher power-performance efficiency. Data-driven processor (DDP) can extract data parallelism inherent in the program and execute it in parallel[1]. In terms of instruction execution control scheme, control-driven processor (CDP) and DDP are located both ends. If we can assess the various trade-offs through the comparison of two kinds of architectures, there is potentiality of investigating more excellent performance architecture and more optimal LSI design. When comparing diverse architectures, flexible modification of common parameters such as data width and instruction memory should be allowed. In addition, it is important to easily evaluate target architectures in a short time. This paper proposes circuit description methods to enable easy modifications of high-speed FPGA platform. A pico-processor (pp)[2] and a simple DDP based on self-timed pipeline (STP) are implemented. In the implementation, by editing the macro configuration file, data field length and width can be changed and the additional circuit can be easily add or remove. As a result, performance-power of the DDP is 1.27 times better than the pp. key words Data driven processor, Control driven processor, self-timed pipelined ii

1 1 2 / 4 2.0.1.................................. 4 2.1............................. 4 2.1.1.................... 4 2.1.2.................... 5 2.2............................ 7 2.2.1 STP.................................. 7 2.2.2................... 8 2.3..................... 10 2.4...................................... 11 3 13 3.1...................................... 13 3.2............... 13 3.3............... 14 3.3.1 HDL........................ 14 3.3.2 HDL.................... 16 3.4...................................... 19 4 20 4.1...................................... 20 4.2................................... 20 4.3................................... 21 iii

5 23 25 26 iv

2.1............. 5 2.2 CDP................................ 6 2.3 STP...................... 8 2.4....................... 9 3.1......................... 14 3.2 PS................... 15 3.3 PS..................... 16 3.4 MM.............................. 17 v

2.1 pp............................ 10 2.2 DDP............................ 11 4.1 CDP...................... 20 4.2 DDP...................... 21 4.3 CDP/DDP............................ 22 vi

1 1 CDP CDP DDP) CDP ( 1

DDP Dmem Dmem DDP MM) MM MM RAM MM MM MM Dmem 5 2 DDP/CDP 3 HDL HDL 4 Alterra CycloneII QuartusII 5 / FPGA 2

3

2 / 2.0.1 CDP DDP CDP DDP 2.1 2.1.1 CPD 1 1 1 1 2.1 2.1 5 4

2.1 シングルサイクル 方 式 CLK マルチサイクル 方 式 CLK IF IF : : Instruction fetch frtch ID ID :: Instruction decode EX EX :: Execution Exexute MEM MEM :: Memory access WB WB : : Write Write back back IF + ID + EX + MEM + WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF + ID + EX + MEM + WB IF ID EX MEM WB IF ID EX MEM WB 2.1 2.1.2 8 CDP PicoProcessor(pP) [2] pp CDP DDP pp 2.2 pp Program counter(pc) Instruction memory(imem) PC 5

2.1 2.2 CDP Stack PC Stack pointer(sp) Stack push Interrupt register(int Reg) PC Register(Reg) Arithmetic Logic Unit(ALU) CC 6

2.2 Data memory(dmem) decoder PC PC IMem decoder MUX Stack Reg ALU Dmem Reg Dmem IMem 2.2 2.2.1 STP STP ( ) (Send ) Ack ) ( STP DDP 2.3 2.3 Data Latch(DL) Logic DL C C C Reset = 0, Send = 7

2.2 STAGE Data Latch Logic Data Latch Logic Data Latch CK0 CK1 CK2 Send0 Ack0 C0 Send1 Ack1 C1 Send2 Ack2 C2 Send3 Ack3 reset 2.3 STP 1, Ack = 1, CK = 0 1. C0 C1 Send1 DL0 2. Send1 C1 Ack1 C0 3. Ack1 C0 Send 4. 1 3 Send Ack 2.2.2 2.4 2.4 6 6 Merge(M) 8

2.2 M CS MM ALU B PS M : Merge CS : Constasnt Storage MM : Matchig Memory ALU : Aritching Logical Unit PS : Program Storage B : Branch 2.4 Constant Storage Constant Storage(CS) CS CS Matching Memory Matching Memory MM(Matching Memory) Arithmetic Logic Unit Arithmetic Logic Unit(ALU) Program Storage Program Storage(PS) 9

2.3 ALU Branch Branch(B) STP 2.3 DDP CDP pp pp 2.1 2.1 pp Instruction Operation Function code ADD 00000 SUB 00010 AND 00100 CDP IMem Function code DDP Function code DDP MM 2 2 Function code Function code Function code DDP 2 DDP 2.2 10

2.4 2.2 DDP Instruction Operation R/L Function code Left opc Right opc ADD 0/1 000000 000 000 SUB 0/1 000100 000 100 AND 0/1 001000 001 000 DDP bit L/R) pp Function code 1bit(0) 6bit 6bit Function code 3bit bit Left opc 3bit Right opc L/R Function code Left opc Right opc MM Left opc Right opc bit DDP DDP CDP 2.4 CDP 1 pp STP DDP DDP MM MM CDP/DDP 11

2.4 12

3 3.1 3.2 3.1 HDL FPGA 13

3.3 代 入 型 プログラム 計 算 グラフ 型 プログラム カスタム 仕 様 (ex.データ 幅 IM/PS 容 量 ) CDP DDP CDP メタHDL 記 述 DDP メタHDL 記 述 合 成 + 配 置 配 線 FPGA 評 価 結 果 (ex. 回 路 規 模 消 費 電 力 処 理 性 能 ) CDP : 制 御 駆 動 プロセッサ DDP : データ 駆 動 プロセッサ IM : 命 令 メモリ PS : プログラムストレージ 3.1 subtype subtype CDP/DDP HDL HDL HDL 3.3 3.3.1 HDL 14

3.3 cp : color : dest : L/R : con : opc : c : data : color dest c data DL1 コピー カラー 行 き 先 左 右 情 報 即 値 演 算 オペレーションコード キャリーフラグ データ PS DL : データラッチ PS : プログラムストレージ color dest L/R con opc c data データパス subtype SPIN is std_logic_vector(18 downto 0) subtype SPOUT is std_logic_vector(23 downto 0) DL2 3.2 PS HDL 3.1 3.2 PS dest dest,l/r,con,opc color,c,data 3.1 PS dest 1. DL 15

3.3 DL : データラッチ PS : プログラムストレージ color DL1 color DL1 dest dest DL2 dest c data DL2 DL3 DL4 cp : コピー color : カラー dest : 行 き 先 L/R : 左 右 情 報 con : 即 値 演 算 opc : オペレーションコード c : キャリーフラグ data : データ PS L/R con opc L/R con opc c data データパス subtype color is std_logic_vector(2 downto 0) subtype dest is std_logic_vector(6 downto 0) subtype L/R is std_logic subtype con is std_logic subtype opc is std_logic_vector(2 downto 0) subtype c is std_logic subtype data is std_logic_vector(7 downto0) DL5 DL6 DL6 DL3 DL4 3.3 PS 2. 3. PS HDL PS 3.2 3.3.2 HDL 3.4 16

3.3 CX2 : コピー 機 能 付 きC 素 子 CEX : 消 去 機 能 付 きC 素 子 MM : マッチングメモリ Up-Down Counter : アドレスカウンタ DL : データラッチ Dmem : データメモリ マクロ 設 定 ファイル 0 : Dmem 削 除 1 : Dmem 追 加 削 除 を 行 う ために 必 要 な ブロック DL CX2 1 0 MUX1 1 0 1 0 MUX2 feb MUX3 cpy match down dmem_empty up Dmem MM mm_full addr cp Up-Down Counter DL CEX Dmemを 追 加 するのに 必 要 な ブロック 拡 張 前 のMM ステージの ブロック 3.4 MM MM MM MM MM CDP MM MM Dmem Dmem 3.4 MM Dmem MM DL MM CEX CEX 17

3.3 C MM Dmem CX2 MUX1 Dmem Up-Down Counter CX2 C 2 C MUX1 Dmem MM Dmem Dmem Ram Up-Down Counter Up-down Counter Dmem LIFO MUX2 MUX3 MUX2 MUX1 MUX1 feb Dmem 1 0 MUX2 feb 0 1 feb 0 0 MUX3 CX2 0 MUX2 MUX3 MUX1 CX2 Dmem Dmem 0 MUX1 CX2 0 MUX1 CX2 1 0 1 Dmem Dmem 18

3.4 3.4 CDP/DDP 19

4 4.1 DDP/CDP 4.2 Altera CycloneII P/CDP 4.1 4.2 4.1 CDP Reg Dmem 8 bit 8 8 bit 256 8 bit CDP [2] 8bit CDP DDP 8bit STP MM 32 DDP 20

4.3 4.2 DDP MM Dmem 7 8 bit 32 22 bit DDP ALU ALU CDP DDP Reg Dmem CDP QuartusII PowerPlay Power Analyzor (LE) 4.3 4.3 LE CDP 86% DDP 32%. DDP. CDP 25MHz DDP 102MHz 21

4.3 4.3 CDP/DDP CDP DDP LE 86% 32% CLK 25.54MHz 102MHz 50.32mW 157.40mW 0.507MHz/mW 0.648MHz/mW. DDP CDP.. CDP 0.507MHz/mW DDP 0.648MHz/mW DDP 27%. DDP.STP. 22

5 1 CDP/DDP CDP DDP MM MM MM FPGA 2 DDP/CDP 3 23

HDL HDL 4 Alterra CycloneII QuartusII CDP 0.507MHz/mW DDP 0.648MHz/mW DDP 27% CDP/DDP DDP FPGA CDP DDP CDP 2 DDP pp 24

, 25

[1] H. Terada, et al, DDMP s: Self-Timed Super-Pipelined Data-Driven Multimedia Processors, Proceedings of the IEEE, 87(2), pp.282 296, Feb. 1999. [2] D.A.Patterson and J.F.Hennessy, Computer Organization and Design, Morgan Kaufmann, p.912, 2008. 26