動的適応型ハードウェアの提案

Similar documents

PLDとFPGA

プロセッサ・アーキテクチャ

26 FPGA FPGA (Field Programmable Gate Array) ASIC (Application Specific Integrated Circuit) FPGA FPGA FPGA FPGA Linux FreeDOS skewed way L1


Microsoft PowerPoint - Lec pptx

「FPGAを用いたプロセッサ検証システムの製作」

untitled

.,. 0. (MSB). =2, =1/2.,. MSB LSB, LSB MSB. MSB 0 LSB 0 0 P

23 Fig. 2: hwmodulev2 3. Reconfigurable HPC 3.1 hw/sw hw/sw hw/sw FPGA PC FPGA PC FPGA HPC FPGA FPGA hw/sw hw/sw hw- Module FPGA hwmodule hw/sw FPGA h

論理設計の基礎

Cloud[2] (48 ) Xeon Phi (50+ ) IBM Cyclops[9] (64 ) Cavium Octeon II (32 ) Tilera Tile-GX (100 ) PE [11][7] 2 Nsim[10] 8080[1] SH-2[5] SH [8

IPSJ SIG Technical Report Vol.2013-ARC-203 No /2/1 SMYLE OpenCL (NEDO) IT FPGA SMYLEref SMYLE OpenCL SMYLE OpenCL FPGA 1

Express5800/320Fc-MR

N Express5800/R320a-E4 N Express5800/R320a-M4 ユーザーズガイド

Express5800/R320a-E4, Express5800/R320b-M4ユーザーズガイド

MAX IIデバイスのIEEE (JTAG)バウンダリ・スキャン・テスト

matrox0

Express5800/R320a-E4/Express5800/R320b-M4ユーザーズガイド

02_Matrox Frame Grabbers_1612

/ / SeamlessCVE

Lab GPIO_35 GPIO

Cyclone IIIデバイスのI/O機能

5 2 5 Stratix IV PLL 2 CMU PLL 1 ALTGX MegaWizard Plug-In Manager Reconfig Alt PLL CMU PLL Channel and TX PLL select/reconfig CMU PLL reconfiguration

strtok-count.eps

FabHetero FabHetero FabHetero FabCache FabCache SPEC2000INT IPC FabCache 0.076%

3 SIMPLE ver 3.2: SIMPLE (SIxteen-bit MicroProcessor for Laboratory Experiment) 1 16 SIMPLE SIMPLE 2 SIMPLE 2.1 SIMPLE (main memo

XC9500 ISP CPLD JTAG Port 3 JTAG Controller In-System Programming Controller 8 36 Function Block Macrocells to 8 /GCK /GSR /GTS 3 2 or 4 Blocks FastCO

Express5800/120Ed

Nios II 簡易チュートリアル

Express5800/320Fa-L/320Fa-LR

Microsoft PowerPoint - FPGA

Nios II ハードウェア・チュートリアル

Agenda GRAPE-MPの紹介と性能評価 GRAPE-MPの概要 OpenCLによる四倍精度演算 (preliminary) 4倍精度演算用SIM 加速ボード 6 processor elem with 128 bit logic Peak: 1.2Gflops

Express5800/R110a-1Hユーザーズガイド

IPSJ SIG Technical Report Vol.2013-ARC-206 No /8/1 Android Dominic Hillenbrand ODROID-X2 GPIO Android OSCAR WFI 500[us] GPIO GP

DRAM SRAM SDRAM (Synchronous DRAM) DDR SDRAM (Double Data Rate SDRAM) DRAM 4 C Wikipedia 1.8 SRAM DRAM DRAM SRAM DRAM SRAM (256M 1G bit) (32 64M bit)

Express5800/110Ee Pentium 1. Express5800/110Ee N N Express5800/110Ee Express5800/110Ee ( /800EB(256)) ( /800EB(256) 20W) CPU L1 L2 CD-

ADZBT1 Hardware User Manual Hardware User Manual Version 1.0 1/13 アドバンスデザインテクノロジー株式会社

A Responsive Processor for Parallel/Distributed Real-time Processing

WinDriver PCI Quick Start Guide

プログラマブル論理デバイス

quattro.PDF

デザインパフォーマンス向上のためのHDLコーディング法


設計現場からの課題抽出と提言 なぜ開発は遅れるか?その解決策は?

Nios® II HAL API を使用したソフトウェア・サンプル集 「Modular Scatter-Gather DMA Core」

スパコンに通じる並列プログラミングの基礎

Express5800/140Ma

PowerPoint プレゼンテーション

Express5800/140Ma

IPSJ SIG Technical Report Vol.2016-ARC-221 No /8/9 GC 1 1 GC GC GC GC DalvikVM GC 12.4% 5.7% 1. Garbage Collection: GC GC Java GC GC GC GC Dalv

FINAL PROGRAM 22th Annual Workshop SWoPP / / 2009 Sendai Summer United Workshops on Parallel, Distributed, and Cooperative Processing

Stratix IIIデバイスの外部メモリ・インタフェース

スパコンに通じる並列プログラミングの基礎

RW1097-0A-001_V0.1_170106

特集新世代マイクロプロセッサアーキテクチャ ( 後編 ) 3. 実例 3 ユビキタス コンピューティング時代の組み込みマイクロコンピュータ, SuperH と M32R 清水徹 * 1 長谷川淳 * 2 服部俊洋 * 3 近藤弘郁 * 4 ( 株 ) ルネサステクノロジシステムソリューション統括本部

6 2. AUTOSAR 2.1 AUTOSAR AUTOSAR ECU OSEK/VDX 3) OSEK/VDX OS AUTOSAR AUTOSAR ECU AUTOSAR 1 AUTOSAR BSW (Basic Software) (Runtime Environment) Applicat


2017 (413812)

XAPP858 - High-Performance DDR2 SDRAM Interface In Virtex-5 Devices

Express5800/320Fa-L/320Fa-LR/320Fa-M/320Fa-MR

DS90CP Gbps 4x4 LVDS Crosspoint Switch (jp)

00-COVER.P65

VM-53PA1取扱説明書

Cisco 1711/1712セキュリティ アクセス ルータの概要

Transcription:

LSI A B C D CPU for i=0; i<k; i++ X[i]=X[i+j]... CPU

PLD 10M 1M 100K 10K PLA EEPROM SPLD FPGA CPLD SRAM FPGA 912000 45 12 1/100 1980 1990 2000

Reconfigurable Systems 1990 The 1st 1992 1993 1995 2000 2002 2003 FPL The 1st Japanese FPGA/PLD Conf. The 1st FCCM SPLASH SPLASH-2 RM-I RM-II PRISM-I PRISM-II MPLD WASMII RM-III Cache Logic YARDS RM-IV DISC RM-V DISC-II HOSMII ATTRACTOR PipeRench FIPSOC RASH CHIMERA DRL PCA Chameleon ACM DRP DAP/DNA PCA2 DAP/DNA2

FPGA CPUFPGA SoC FPGA

SoPDSystem on Programmable Device) CPU/Reconfigurable System OS FPGA 9151617 FCCM

FPGA/CPLD SoC CPUFPGA/PLD DRP

I/O CPU Memory Application Specific Hardware SoC (System-on-a-Chip) CPU,Memory,I/O, LSI Cellular Phones, Network Controllers, Mobile Terminals Problem! (JPEG2000, AES,Turbo code..)!!

+? I/O CPU Application Common Specific FPGA Hardware FPGA Xilinx Virtex II Pro PowerPC Altera Excalibur (ARM) Memory

CPU+ I/O CPU Application Dynamic Reconfigurable Specific Hardware Processor Course Grain Structure Memory C-level programming

C-level DRP

Chameleon CS2112DPU OPOperations in C or Verilog bitbit SIMD arrays and pipelines are formed with multiple DPUs. Instruction Routing MUX Routing MUX Barrel Shifter Register Mask Register Mask OP Register Register

DRP-1

On-Chip Memory 10 s micro-seconds PACT Xpp Elixent s DFA On-Chip Memory

RGB to YCbCr Downsampling DCT Quantization VLC CPU JPEG CPU CPU

Input data Output data Configuration RAM ROM MPLD(1990) MPLD(1990) WASMII RAM(1992) (1992)Xilinx(1997) NECDRP(2002) DRP(2002)Partial Context switch, runtime reconfiguration Logic cells Multiplexer 1 2 n Context SRAM slots

Task N Task N+1

Task N+2 Task N+1

1/ /

FPGA NTT PCA-2 Elixent DFA PACT Xpp 1 NEC DRL NEC DRP PipeRench IPFlex DAP/DNA 4-5bit 8bit 16bit 32bit

3C-level C(BDL, Dataflow C, Stream C ASIC Verilog HDLVHDLC-level

/ DRP

90 PACTxpp ElixentDFA1000 DSP NTTPCA/PCA-2

Xpp (PACT Informations technologie) I/O I/O PAC CM PAC CM I/O I/O I/O I/O CM PAC SCM CM PAC I/O I/O PAC: Processing Array Cluster) CM: Configuration Manager SCM: Supervising CM PAE Configuration controller Xpp64 (8x8 Configuation100 24

ElixentDFA1000 bit ALU Register RAM based switch box ALU R ALU R R R R R ALU R ALU R R R R R ALU R ALU R R R R R ALU R ALU R R R R R

PCA (Plastic Cell Architecture) NTT BP PP

Plastic Part LUT LUT BP LUT basic cell PP LUT 1LUT Buit-in-part Hw fork

Chameleon IPFlexDAP/DNA, DAP/DNA-2 NEC DRP QuicksilverACM MorphTechrDSP PicoChipPC101

Chameleon CS2112 32-bit PCI Bus PCI Cont. RISC Core 64-bit Memory Bus Memory Controller 128-bit RoadRunner Bus Configuration Subsystem DMA Subsystem Reconfigurable Processing Fabric 160-pin Programmable I/O

Reconfigurable 8 instructions Processing stored in Fabric the CTL in Chameleon LM DPU are executed in the DPU. The CTL can select the next instruction in the same cycle. Configuration CTL can be LMchanged DPU by loading a bit stream. CTL Tile 0 Tile 0 Slice 0 Slice 3 108 DPU(Data Path Unit)s consists 4 Slices3Tiles each 1Tile: 9DPU32bit ALU X 7 16bit + 16bit multiplier X 2

IPflexDAP/DNA-2 DDR SDR IF (64bit 166MHz) PCI IF (32bit 66MHz) DAP (RISC) DMA Controller Interrupt Controller Timer SROM IF GPIO UART Serial IF BSU DNA load buffer DNA store buffer DNA Matrix DNA direct I/O (Async. In) DNA direct I/O (Async. out) 368 ALU,

DNA SMA RAM FF FF FF FF Shift/Mask Shift/Mask Shift/Mask Shift/Mask FF FF FF FF ALU ALU ALU FF FF

PipeRench CMU) Global buses PE PE PE PE Pass registers Interconnection PE PE PE PE stripe Interconnection

Pipelined Reconfiguration Cycle: 12 34 56 Stage 1 Stage 2 Stage 3 Virtual pipeline Stage 4 Stage 5 Cycle: Stage 1 Stage 2 Stage 3 12 34 56 1 4 2 5 36 Physical pipeline

PEPC PE PE PE DSP QuicksilverACM MorphTechrDSP PicoChipPC101

ACM Quicksilver) Matrix Interconnect Network Adaptive Node Programmable Node Domain Node Level1 Cluster Level2 Cluster Level3 Cluster

Adaptive Node Domain Node Filter, Bit Programmable Scalar Node RISC CPU (ARC 1 31 Silver-C

RGB to YCbCr Downsampling DCT Quantization VLC AXN DBN PSN

RGB to YCbCr Downsampling DCT Quantization VLC AXN DAN DBN PSN

RGB to YCbCr Downsampling DCT Quantization VLC AXN DBN PSN AXN AXN

Granularity 32bit 16bit 100 1000 FPGA DAP/DNA CS2112 Parallelism 8 16 DRL DRP PipeRench DSP PC101 Traditional Processors ACM Chip-Multiprocessor Common Processor 8bit 10 4bit 3 8 16 Many Time Multiplexing

DRP

NEC DRP (Dynamically Reconfigurable Processor) 16 5 PE

DRP-1 Tile Vmem Hmem

DRP Tile HMEM HMEM HMEM HMEM VMEM VMEM VMEM VMEM VMEM ctrl VMEM ctrl VMEM VMEM VMEM VMEM PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE VMEM(2-port HMEM(1-port memory) PE PE PE PE PE PE PE PE State Transition Controller 8bit 256entry 8092entry PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE HMEM HMEM HMEM HMEM VMEM VMEM VMEM VMEM VMEM ctrl VMEM ctrl VMEM VMEM VMEM VMEM

1. 2. 3. BDL DRP

DRPVLIWFPGA DAP/DNA Granularity 32bit 16bit 100 1000 FPGA CS2112 Parallelism 8 16 DRL DRP PipeRench DSP PC101 ACM Chip-Multiprocessor Common Processor 8bit 10 4bit 3 8 16 Many Time Multiplexing

WASMIIDRP WASMII DRP DRP Configuration Data line Execution Input Token Registers Control

MDCT CPU DRP-1 1576 Window: 18 MDCT

DRP-1MDCT Cosine Window I Butterfly Function Cosine Windows II N-point DCT N-point DCT Subtraction Window Function

DRP-1 576/2 +2 Stream IN 32(24) 9 20 576/2 Stream OUT 51clk X 45ns = 2295ns/1 13 22 Pentium III 600MHz1,8

Stream Data in Stream Data out

BDL( Behavior Description Language) gcc Musketeer

RTL

Van del Pol/ Neural NetworkFPGA/PLD Conf. 20031 Wavelet(CPSY 20031 MDCTCoolChipsVI 20034 ViterbiReconf I 20039 RC6Reconf III 20041 Reconf III 20041 FPL 20039 MINReconf III 20041 Musketeer FFTFPGA/PLD Conf. 20041FCCM 20044 MDCT CPSY: Reconf:

FFT DRP DRP MIPS64 TI DSP 45056 11776 248047 83997 50MHz 33MHz 500MHz 225MHz FFT/ 1109 2802 2015 2678

C DWT 14 61MHz Pentium III(600M)2 13 22 Pentium III(600M)1.8 Viterbi 12 33MHz Pentium IV 2.4G5 RC6 13 32MHz MIPS64(500M)6 TMS320C6713(225M)22 5 38 Pentium IV(2.5G)3 TMS32C6713(225M)17

DRP Open Problem

Chameleon Configurable Processor DSP FPGA/CPLD System On Chip

Open Problems Configuration Configuration Processing Element 8-32bit Xilinx DRP ACM,IPFlex

Reconfigurable Architecture Big Problem ISA (Instruction Set Architecture) Reconfigurable Architecture

SARAStream processing architecture with Reconfigurable processor array) Network-On-Chip: Black-bus DRPSARA MP3WaveletJPEG2000Viterbi coderalpha blender

SARA (Stream processing Architecture with Reconfigurable processor Array) DRP I/O data CPU RAM Stream I/O controller Tile Tile Tile Tile Interconnection Network Tile Tile Tile Tile Context Shared Memory Local STC PE Global STC Context loading controller Configuration data

TileBlack-bus Router (0,0) Send( D0, 0 ) Send( D1, 1 ) Receive( 0, task1 ) Task 1 (1,0) (2,0) (3,0) (D0,ID=3) (D0,ID=0) Task 2 0 =(013) (D1,ID=1) (0,0) (1,0) (D0,ID=1) (2,0) (D1,ID=2) (0,1) (1,1) (D1,ID=1) Receive( 1, task1 ) (2,1) (3,1) (D1,ID=0) Task 3 1 =(1210) (0,1) (1,1) (2,1) ID ID ID ID

VHw (NEC2004 VH VH

Hw A A A1 A2 A3 A4 A6 A5

Fixed Region A1 A A A1 A2 A3 A4 A6 A5

Fixed Region A A5 A A1 A2 A3 A4 A6 A5

Fixed Region A1 A A7 A A4 A A2 A7 A3 A A6 A5

/ OSCPU University Program DRP, DAP/DNA-2