LSI A B C D CPU for i=0; i<k; i++ X[i]=X[i+j]... CPU
PLD 10M 1M 100K 10K PLA EEPROM SPLD FPGA CPLD SRAM FPGA 912000 45 12 1/100 1980 1990 2000
Reconfigurable Systems 1990 The 1st 1992 1993 1995 2000 2002 2003 FPL The 1st Japanese FPGA/PLD Conf. The 1st FCCM SPLASH SPLASH-2 RM-I RM-II PRISM-I PRISM-II MPLD WASMII RM-III Cache Logic YARDS RM-IV DISC RM-V DISC-II HOSMII ATTRACTOR PipeRench FIPSOC RASH CHIMERA DRL PCA Chameleon ACM DRP DAP/DNA PCA2 DAP/DNA2
FPGA CPUFPGA SoC FPGA
SoPDSystem on Programmable Device) CPU/Reconfigurable System OS FPGA 9151617 FCCM
FPGA/CPLD SoC CPUFPGA/PLD DRP
I/O CPU Memory Application Specific Hardware SoC (System-on-a-Chip) CPU,Memory,I/O, LSI Cellular Phones, Network Controllers, Mobile Terminals Problem! (JPEG2000, AES,Turbo code..)!!
+? I/O CPU Application Common Specific FPGA Hardware FPGA Xilinx Virtex II Pro PowerPC Altera Excalibur (ARM) Memory
CPU+ I/O CPU Application Dynamic Reconfigurable Specific Hardware Processor Course Grain Structure Memory C-level programming
C-level DRP
Chameleon CS2112DPU OPOperations in C or Verilog bitbit SIMD arrays and pipelines are formed with multiple DPUs. Instruction Routing MUX Routing MUX Barrel Shifter Register Mask Register Mask OP Register Register
DRP-1
On-Chip Memory 10 s micro-seconds PACT Xpp Elixent s DFA On-Chip Memory
RGB to YCbCr Downsampling DCT Quantization VLC CPU JPEG CPU CPU
Input data Output data Configuration RAM ROM MPLD(1990) MPLD(1990) WASMII RAM(1992) (1992)Xilinx(1997) NECDRP(2002) DRP(2002)Partial Context switch, runtime reconfiguration Logic cells Multiplexer 1 2 n Context SRAM slots
Task N Task N+1
Task N+2 Task N+1
1/ /
FPGA NTT PCA-2 Elixent DFA PACT Xpp 1 NEC DRL NEC DRP PipeRench IPFlex DAP/DNA 4-5bit 8bit 16bit 32bit
3C-level C(BDL, Dataflow C, Stream C ASIC Verilog HDLVHDLC-level
/ DRP
90 PACTxpp ElixentDFA1000 DSP NTTPCA/PCA-2
Xpp (PACT Informations technologie) I/O I/O PAC CM PAC CM I/O I/O I/O I/O CM PAC SCM CM PAC I/O I/O PAC: Processing Array Cluster) CM: Configuration Manager SCM: Supervising CM PAE Configuration controller Xpp64 (8x8 Configuation100 24
ElixentDFA1000 bit ALU Register RAM based switch box ALU R ALU R R R R R ALU R ALU R R R R R ALU R ALU R R R R R ALU R ALU R R R R R
PCA (Plastic Cell Architecture) NTT BP PP
Plastic Part LUT LUT BP LUT basic cell PP LUT 1LUT Buit-in-part Hw fork
Chameleon IPFlexDAP/DNA, DAP/DNA-2 NEC DRP QuicksilverACM MorphTechrDSP PicoChipPC101
Chameleon CS2112 32-bit PCI Bus PCI Cont. RISC Core 64-bit Memory Bus Memory Controller 128-bit RoadRunner Bus Configuration Subsystem DMA Subsystem Reconfigurable Processing Fabric 160-pin Programmable I/O
Reconfigurable 8 instructions Processing stored in Fabric the CTL in Chameleon LM DPU are executed in the DPU. The CTL can select the next instruction in the same cycle. Configuration CTL can be LMchanged DPU by loading a bit stream. CTL Tile 0 Tile 0 Slice 0 Slice 3 108 DPU(Data Path Unit)s consists 4 Slices3Tiles each 1Tile: 9DPU32bit ALU X 7 16bit + 16bit multiplier X 2
IPflexDAP/DNA-2 DDR SDR IF (64bit 166MHz) PCI IF (32bit 66MHz) DAP (RISC) DMA Controller Interrupt Controller Timer SROM IF GPIO UART Serial IF BSU DNA load buffer DNA store buffer DNA Matrix DNA direct I/O (Async. In) DNA direct I/O (Async. out) 368 ALU,
DNA SMA RAM FF FF FF FF Shift/Mask Shift/Mask Shift/Mask Shift/Mask FF FF FF FF ALU ALU ALU FF FF
PipeRench CMU) Global buses PE PE PE PE Pass registers Interconnection PE PE PE PE stripe Interconnection
Pipelined Reconfiguration Cycle: 12 34 56 Stage 1 Stage 2 Stage 3 Virtual pipeline Stage 4 Stage 5 Cycle: Stage 1 Stage 2 Stage 3 12 34 56 1 4 2 5 36 Physical pipeline
PEPC PE PE PE DSP QuicksilverACM MorphTechrDSP PicoChipPC101
ACM Quicksilver) Matrix Interconnect Network Adaptive Node Programmable Node Domain Node Level1 Cluster Level2 Cluster Level3 Cluster
Adaptive Node Domain Node Filter, Bit Programmable Scalar Node RISC CPU (ARC 1 31 Silver-C
RGB to YCbCr Downsampling DCT Quantization VLC AXN DBN PSN
RGB to YCbCr Downsampling DCT Quantization VLC AXN DAN DBN PSN
RGB to YCbCr Downsampling DCT Quantization VLC AXN DBN PSN AXN AXN
Granularity 32bit 16bit 100 1000 FPGA DAP/DNA CS2112 Parallelism 8 16 DRL DRP PipeRench DSP PC101 Traditional Processors ACM Chip-Multiprocessor Common Processor 8bit 10 4bit 3 8 16 Many Time Multiplexing
DRP
NEC DRP (Dynamically Reconfigurable Processor) 16 5 PE
DRP-1 Tile Vmem Hmem
DRP Tile HMEM HMEM HMEM HMEM VMEM VMEM VMEM VMEM VMEM ctrl VMEM ctrl VMEM VMEM VMEM VMEM PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE VMEM(2-port HMEM(1-port memory) PE PE PE PE PE PE PE PE State Transition Controller 8bit 256entry 8092entry PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE HMEM HMEM HMEM HMEM VMEM VMEM VMEM VMEM VMEM ctrl VMEM ctrl VMEM VMEM VMEM VMEM
1. 2. 3. BDL DRP
DRPVLIWFPGA DAP/DNA Granularity 32bit 16bit 100 1000 FPGA CS2112 Parallelism 8 16 DRL DRP PipeRench DSP PC101 ACM Chip-Multiprocessor Common Processor 8bit 10 4bit 3 8 16 Many Time Multiplexing
WASMIIDRP WASMII DRP DRP Configuration Data line Execution Input Token Registers Control
MDCT CPU DRP-1 1576 Window: 18 MDCT
DRP-1MDCT Cosine Window I Butterfly Function Cosine Windows II N-point DCT N-point DCT Subtraction Window Function
DRP-1 576/2 +2 Stream IN 32(24) 9 20 576/2 Stream OUT 51clk X 45ns = 2295ns/1 13 22 Pentium III 600MHz1,8
Stream Data in Stream Data out
BDL( Behavior Description Language) gcc Musketeer
RTL
Van del Pol/ Neural NetworkFPGA/PLD Conf. 20031 Wavelet(CPSY 20031 MDCTCoolChipsVI 20034 ViterbiReconf I 20039 RC6Reconf III 20041 Reconf III 20041 FPL 20039 MINReconf III 20041 Musketeer FFTFPGA/PLD Conf. 20041FCCM 20044 MDCT CPSY: Reconf:
FFT DRP DRP MIPS64 TI DSP 45056 11776 248047 83997 50MHz 33MHz 500MHz 225MHz FFT/ 1109 2802 2015 2678
C DWT 14 61MHz Pentium III(600M)2 13 22 Pentium III(600M)1.8 Viterbi 12 33MHz Pentium IV 2.4G5 RC6 13 32MHz MIPS64(500M)6 TMS320C6713(225M)22 5 38 Pentium IV(2.5G)3 TMS32C6713(225M)17
DRP Open Problem
Chameleon Configurable Processor DSP FPGA/CPLD System On Chip
Open Problems Configuration Configuration Processing Element 8-32bit Xilinx DRP ACM,IPFlex
Reconfigurable Architecture Big Problem ISA (Instruction Set Architecture) Reconfigurable Architecture
SARAStream processing architecture with Reconfigurable processor array) Network-On-Chip: Black-bus DRPSARA MP3WaveletJPEG2000Viterbi coderalpha blender
SARA (Stream processing Architecture with Reconfigurable processor Array) DRP I/O data CPU RAM Stream I/O controller Tile Tile Tile Tile Interconnection Network Tile Tile Tile Tile Context Shared Memory Local STC PE Global STC Context loading controller Configuration data
TileBlack-bus Router (0,0) Send( D0, 0 ) Send( D1, 1 ) Receive( 0, task1 ) Task 1 (1,0) (2,0) (3,0) (D0,ID=3) (D0,ID=0) Task 2 0 =(013) (D1,ID=1) (0,0) (1,0) (D0,ID=1) (2,0) (D1,ID=2) (0,1) (1,1) (D1,ID=1) Receive( 1, task1 ) (2,1) (3,1) (D1,ID=0) Task 3 1 =(1210) (0,1) (1,1) (2,1) ID ID ID ID
VHw (NEC2004 VH VH
Hw A A A1 A2 A3 A4 A6 A5
Fixed Region A1 A A A1 A2 A3 A4 A6 A5
Fixed Region A A5 A A1 A2 A3 A4 A6 A5
Fixed Region A1 A A7 A A4 A A2 A7 A3 A A6 A5
/ OSCPU University Program DRP, DAP/DNA-2