22 / FPGA A Study of FPGA Platform for Architecture Evaluation of a Data-Driven/Control-Driven Processor 1110232
/ FPGA LSI [1] CDP DDP 2 LSI FPGA PicoProcessor(pP)[2] (STP)[1] DDP 1.27 i
Abstract A Study of FPGA Platform for Architecture Evaluation of a Data-Driven/Control-Driven Processor Hajime OOISO To improve the performance of LSI s only by increasing clock frequency cannot meet to market requirements due to increasing power consumption. Now, in order to further improve the processing performance, LSI must introduce parallel processing scheme achieving higher power-performance efficiency. Data-driven processor (DDP) can extract data parallelism inherent in the program and execute it in parallel[1]. In terms of instruction execution control scheme, control-driven processor (CDP) and DDP are located both ends. If we can assess the various trade-offs through the comparison of two kinds of architectures, there is potentiality of investigating more excellent performance architecture and more optimal LSI design. When comparing diverse architectures, flexible modification of common parameters such as data width and instruction memory should be allowed. In addition, it is important to easily evaluate target architectures in a short time. This paper proposes circuit description methods to enable easy modifications of high-speed FPGA platform. A pico-processor (pp)[2] and a simple DDP based on self-timed pipeline (STP) are implemented. In the implementation, by editing the macro configuration file, data field length and width can be changed and the additional circuit can be easily add or remove. As a result, performance-power of the DDP is 1.27 times better than the pp. key words Data driven processor, Control driven processor, self-timed pipelined ii
1 1 2 / 4 2.0.1.................................. 4 2.1............................. 4 2.1.1.................... 4 2.1.2.................... 5 2.2............................ 7 2.2.1 STP.................................. 7 2.2.2................... 8 2.3..................... 10 2.4...................................... 11 3 13 3.1...................................... 13 3.2............... 13 3.3............... 14 3.3.1 HDL........................ 14 3.3.2 HDL.................... 16 3.4...................................... 19 4 20 4.1...................................... 20 4.2................................... 20 4.3................................... 21 iii
5 23 25 26 iv
2.1............. 5 2.2 CDP................................ 6 2.3 STP...................... 8 2.4....................... 9 3.1......................... 14 3.2 PS................... 15 3.3 PS..................... 16 3.4 MM.............................. 17 v
2.1 pp............................ 10 2.2 DDP............................ 11 4.1 CDP...................... 20 4.2 DDP...................... 21 4.3 CDP/DDP............................ 22 vi
1 1 CDP CDP DDP) CDP ( 1
DDP Dmem Dmem DDP MM) MM MM RAM MM MM MM Dmem 5 2 DDP/CDP 3 HDL HDL 4 Alterra CycloneII QuartusII 5 / FPGA 2
3
2 / 2.0.1 CDP DDP CDP DDP 2.1 2.1.1 CPD 1 1 1 1 2.1 2.1 5 4
2.1 シングルサイクル 方 式 CLK マルチサイクル 方 式 CLK IF IF : : Instruction fetch frtch ID ID :: Instruction decode EX EX :: Execution Exexute MEM MEM :: Memory access WB WB : : Write Write back back IF + ID + EX + MEM + WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF + ID + EX + MEM + WB IF ID EX MEM WB IF ID EX MEM WB 2.1 2.1.2 8 CDP PicoProcessor(pP) [2] pp CDP DDP pp 2.2 pp Program counter(pc) Instruction memory(imem) PC 5
2.1 2.2 CDP Stack PC Stack pointer(sp) Stack push Interrupt register(int Reg) PC Register(Reg) Arithmetic Logic Unit(ALU) CC 6
2.2 Data memory(dmem) decoder PC PC IMem decoder MUX Stack Reg ALU Dmem Reg Dmem IMem 2.2 2.2.1 STP STP ( ) (Send ) Ack ) ( STP DDP 2.3 2.3 Data Latch(DL) Logic DL C C C Reset = 0, Send = 7
2.2 STAGE Data Latch Logic Data Latch Logic Data Latch CK0 CK1 CK2 Send0 Ack0 C0 Send1 Ack1 C1 Send2 Ack2 C2 Send3 Ack3 reset 2.3 STP 1, Ack = 1, CK = 0 1. C0 C1 Send1 DL0 2. Send1 C1 Ack1 C0 3. Ack1 C0 Send 4. 1 3 Send Ack 2.2.2 2.4 2.4 6 6 Merge(M) 8
2.2 M CS MM ALU B PS M : Merge CS : Constasnt Storage MM : Matchig Memory ALU : Aritching Logical Unit PS : Program Storage B : Branch 2.4 Constant Storage Constant Storage(CS) CS CS Matching Memory Matching Memory MM(Matching Memory) Arithmetic Logic Unit Arithmetic Logic Unit(ALU) Program Storage Program Storage(PS) 9
2.3 ALU Branch Branch(B) STP 2.3 DDP CDP pp pp 2.1 2.1 pp Instruction Operation Function code ADD 00000 SUB 00010 AND 00100 CDP IMem Function code DDP Function code DDP MM 2 2 Function code Function code Function code DDP 2 DDP 2.2 10
2.4 2.2 DDP Instruction Operation R/L Function code Left opc Right opc ADD 0/1 000000 000 000 SUB 0/1 000100 000 100 AND 0/1 001000 001 000 DDP bit L/R) pp Function code 1bit(0) 6bit 6bit Function code 3bit bit Left opc 3bit Right opc L/R Function code Left opc Right opc MM Left opc Right opc bit DDP DDP CDP 2.4 CDP 1 pp STP DDP DDP MM MM CDP/DDP 11
2.4 12
3 3.1 3.2 3.1 HDL FPGA 13
3.3 代 入 型 プログラム 計 算 グラフ 型 プログラム カスタム 仕 様 (ex.データ 幅 IM/PS 容 量 ) CDP DDP CDP メタHDL 記 述 DDP メタHDL 記 述 合 成 + 配 置 配 線 FPGA 評 価 結 果 (ex. 回 路 規 模 消 費 電 力 処 理 性 能 ) CDP : 制 御 駆 動 プロセッサ DDP : データ 駆 動 プロセッサ IM : 命 令 メモリ PS : プログラムストレージ 3.1 subtype subtype CDP/DDP HDL HDL HDL 3.3 3.3.1 HDL 14
3.3 cp : color : dest : L/R : con : opc : c : data : color dest c data DL1 コピー カラー 行 き 先 左 右 情 報 即 値 演 算 オペレーションコード キャリーフラグ データ PS DL : データラッチ PS : プログラムストレージ color dest L/R con opc c data データパス subtype SPIN is std_logic_vector(18 downto 0) subtype SPOUT is std_logic_vector(23 downto 0) DL2 3.2 PS HDL 3.1 3.2 PS dest dest,l/r,con,opc color,c,data 3.1 PS dest 1. DL 15
3.3 DL : データラッチ PS : プログラムストレージ color DL1 color DL1 dest dest DL2 dest c data DL2 DL3 DL4 cp : コピー color : カラー dest : 行 き 先 L/R : 左 右 情 報 con : 即 値 演 算 opc : オペレーションコード c : キャリーフラグ data : データ PS L/R con opc L/R con opc c data データパス subtype color is std_logic_vector(2 downto 0) subtype dest is std_logic_vector(6 downto 0) subtype L/R is std_logic subtype con is std_logic subtype opc is std_logic_vector(2 downto 0) subtype c is std_logic subtype data is std_logic_vector(7 downto0) DL5 DL6 DL6 DL3 DL4 3.3 PS 2. 3. PS HDL PS 3.2 3.3.2 HDL 3.4 16
3.3 CX2 : コピー 機 能 付 きC 素 子 CEX : 消 去 機 能 付 きC 素 子 MM : マッチングメモリ Up-Down Counter : アドレスカウンタ DL : データラッチ Dmem : データメモリ マクロ 設 定 ファイル 0 : Dmem 削 除 1 : Dmem 追 加 削 除 を 行 う ために 必 要 な ブロック DL CX2 1 0 MUX1 1 0 1 0 MUX2 feb MUX3 cpy match down dmem_empty up Dmem MM mm_full addr cp Up-Down Counter DL CEX Dmemを 追 加 するのに 必 要 な ブロック 拡 張 前 のMM ステージの ブロック 3.4 MM MM MM MM MM CDP MM MM Dmem Dmem 3.4 MM Dmem MM DL MM CEX CEX 17
3.3 C MM Dmem CX2 MUX1 Dmem Up-Down Counter CX2 C 2 C MUX1 Dmem MM Dmem Dmem Ram Up-Down Counter Up-down Counter Dmem LIFO MUX2 MUX3 MUX2 MUX1 MUX1 feb Dmem 1 0 MUX2 feb 0 1 feb 0 0 MUX3 CX2 0 MUX2 MUX3 MUX1 CX2 Dmem Dmem 0 MUX1 CX2 0 MUX1 CX2 1 0 1 Dmem Dmem 18
3.4 3.4 CDP/DDP 19
4 4.1 DDP/CDP 4.2 Altera CycloneII P/CDP 4.1 4.2 4.1 CDP Reg Dmem 8 bit 8 8 bit 256 8 bit CDP [2] 8bit CDP DDP 8bit STP MM 32 DDP 20
4.3 4.2 DDP MM Dmem 7 8 bit 32 22 bit DDP ALU ALU CDP DDP Reg Dmem CDP QuartusII PowerPlay Power Analyzor (LE) 4.3 4.3 LE CDP 86% DDP 32%. DDP. CDP 25MHz DDP 102MHz 21
4.3 4.3 CDP/DDP CDP DDP LE 86% 32% CLK 25.54MHz 102MHz 50.32mW 157.40mW 0.507MHz/mW 0.648MHz/mW. DDP CDP.. CDP 0.507MHz/mW DDP 0.648MHz/mW DDP 27%. DDP.STP. 22
5 1 CDP/DDP CDP DDP MM MM MM FPGA 2 DDP/CDP 3 23
HDL HDL 4 Alterra CycloneII QuartusII CDP 0.507MHz/mW DDP 0.648MHz/mW DDP 27% CDP/DDP DDP FPGA CDP DDP CDP 2 DDP pp 24
, 25
[1] H. Terada, et al, DDMP s: Self-Timed Super-Pipelined Data-Driven Multimedia Processors, Proceedings of the IEEE, 87(2), pp.282 296, Feb. 1999. [2] D.A.Patterson and J.F.Hennessy, Computer Organization and Design, Morgan Kaufmann, p.912, 2008. 26