2) 3) 3 4)5) 6)7) SiP LSI CPU SoC LSI LSI LSI. 2 3 NoC Network-on-Chip NoC ) 3) 8) 4) SoC CMOS 8GHz BER Bit Error Rate 7) 0.14pJ

Similar documents
1 Hybrid Memory Cube HMC CPU HMC 2. Hybrid Memory Cube HMC 2.1 Hybrid Memory Cube (HMC) Micron HMC DDR DRAM TSV I/O HMC 1 1 (Vault ) 4 4 HMC DDR

23 Fig. 2: hwmodulev2 3. Reconfigurable HPC 3.1 hw/sw hw/sw hw/sw FPGA PC FPGA PC FPGA HPC FPGA FPGA hw/sw hw/sw hw- Module FPGA hwmodule hw/sw FPGA h

2). 3) 4) 1.2 NICTNICT DCRA Dihedral Corner Reflector micro-arraysdcra DCRA DCRA DCRA 3D DCRA PC USB PC PC ON / OFF Velleman K8055 K8055 K8055

IPSJ SIG Technical Report Vol.2010-GN-74 No /1/ , 3 Disaster Training Supporting System Based on Electronic Triage HIROAKI KOJIMA, 1 KU

LTE移動通信システムのフィールドトライアル

6 2. AUTOSAR 2.1 AUTOSAR AUTOSAR ECU OSEK/VDX 3) OSEK/VDX OS AUTOSAR AUTOSAR ECU AUTOSAR 1 AUTOSAR BSW (Basic Software) (Runtime Environment) Applicat

Vol. 48 No. 4 Apr LAN TCP/IP LAN TCP/IP 1 PC TCP/IP 1 PC User-mode Linux 12 Development of a System to Visualize Computer Network Behavior for L

A Responsive Processor for Parallel/Distributed Real-time Processing

1 Fig. 1 Extraction of motion,.,,, 4,,, 3., 1, 2. 2.,. CHLAC,. 2.1,. (256 ).,., CHLAC. CHLAC, HLAC. 2.3 (HLAC ) r,.,. HLAC. N. 2 HLAC Fig. 2

IPSJ SIG Technical Report Vol.2011-EC-19 No /3/ ,.,., Peg-Scope Viewer,,.,,,,. Utilization of Watching Logs for Support of Multi-

第62巻 第1号 平成24年4月/石こうを用いた木材ペレット

IPSJ SIG Technical Report Vol.2016-CE-137 No /12/ e β /α α β β / α A judgment method of difficulty of task for a learner using simple

A Feasibility Study of Direct-Mapping-Type Parallel Processing Method to Solve Linear Equations in Load Flow Calculations Hiroaki Inayoshi, Non-member

IPSJ SIG Technical Report Vol.2013-ARC-203 No /2/1 SMYLE OpenCL (NEDO) IT FPGA SMYLEref SMYLE OpenCL SMYLE OpenCL FPGA 1

2. CABAC CABAC CABAC 1 1 CABAC Figure 1 Overview of CABAC 2 DCT 2 0/ /1 CABAC [3] 3. 2 値化部 コンテキスト計算部 2 値算術符号化部 CABAC CABAC

Fig. 3 Flow diagram of image processing. Black rectangle in the photo indicates the processing area (128 x 32 pixels).

xx/xx Vol. Jxx A No. xx 1 Fig. 1 PAL(Panoramic Annular Lens) PAL(Panoramic Annular Lens) PAL (2) PAL PAL 2 PAL 3 2 PAL 1 PAL 3 PAL PAL 2. 1 PAL

GPGPU

FabHetero FabHetero FabHetero FabCache FabCache SPEC2000INT IPC FabCache 0.076%

( ) [1] [4] ( ) 2. [5] [6] Piano Tutor[7] [1], [2], [8], [9] Radiobaton[10] Two Finger Piano[11] Coloring-in Piano[12] ism[13] MIDI MIDI 1 Fig. 1 Syst

VLSI工学

258 5) GPS 1 GPS 6) GPS DP 7) 8) 10) GPS GPS ) GPS Global Positioning System

DPA,, ShareLog 3) 4) 2.2 Strino Strino STRain-based user Interface with tacticle of elastic Natural ObjectsStrino 1 Strino ) PC Log-Log (2007 6)

B HNS 7)8) HNS ( ( ) 7)8) (SOA) HNS HNS 4) HNS ( ) ( ) 1 TV power, channel, volume power true( ON) false( OFF) boolean channel volume int

1 Web [2] Web [3] [4] [5], [6] [7] [8] S.W. [9] 3. MeetingShelf Web MeetingShelf MeetingShelf (1) (2) (3) (4) (5) Web MeetingShelf

IPSJ SIG Technical Report Vol.2009-DPS-141 No.20 Vol.2009-GN-73 No.20 Vol.2009-EIP-46 No /11/27 1. MIERUKEN 1 2 MIERUKEN MIERUKEN MIERUKEN: Spe

Vol.55 No (Jan. 2014) saccess 6 saccess 7 saccess 2. [3] p.33 * B (A) (B) (C) (D) (E) (F) *1 [3], [4] Web PDF a m

自然言語処理16_2_45

( )

17 Proposal of an Algorithm of Image Extraction and Research on Improvement of a Man-machine Interface of Food Intake Measuring System

Table 1. Reluctance equalization design. Fig. 2. Voltage vector of LSynRM. Fig. 4. Analytical model. Table 2. Specifications of analytical models. Fig

16_.....E...._.I.v2006

& Vol.5 No (Oct. 2015) TV 1,2,a) , Augmented TV TV AR Augmented Reality 3DCG TV Estimation of TV Screen Position and Ro

(a) 1 (b) 3. Gilbert Pernicka[2] Treibitz Schechner[3] Narasimhan [4] Kim [5] Nayar [6] [7][8][9] 2. X X X [10] [11] L L t L s L = L t + L s

ID 3) 9 4) 5) ID 2 ID 2 ID 2 Bluetooth ID 2 SRCid1 DSTid2 2 id1 id2 ID SRC DST SRC 2 2 ID 2 2 QR 6) 8) 6) QR QR QR QR

NKK NEWS 2012

Table 1. Assumed performance of a water electrol ysis plant. Fig. 1. Structure of a proposed power generation system utilizing waste heat from factori

情報処理学会研究報告 IPSJ SIG Technical Report Vol.2013-HPC-139 No /5/29 Gfarm/Pwrake NICT NICT 10TB 100TB CPU I/O HPC I/O NICT Gf

<95DB8C9288E397C389C88A E696E6462>

IPSJ SIG Technical Report Vol.2014-DBS-159 No.6 Vol.2014-IFAT-115 No /8/1 1,a) 1 1 1,, 1. ([1]) ([2], [3]) A B 1 ([4]) 1 Graduate School of Info

FINAL PROGRAM 22th Annual Workshop SWoPP / / 2009 Sendai Summer United Workshops on Parallel, Distributed, and Cooperative Processing

A Study on Throw Simulation for Baseball Pitching Machine with Rollers and Its Optimization Shinobu SAKAI*5, Yuichiro KITAGAWA, Ryo KANAI and Juhachi

1: A/B/C/D Fig. 1 Modeling Based on Difference in Agitation Method artisoc[7] A D 2017 Information Processing

2017 (413812)

Input image Initialize variables Loop for period of oscillation Update height map Make shade image Change property of image Output image Change time L

Macintosh HD:Users:ks91:Documents:lect:nm2002s:nm2002s03.dvi

IPSJ SIG Technical Report Vol.2013-ARC-206 No /8/1 Android Dominic Hillenbrand ODROID-X2 GPIO Android OSCAR WFI 500[us] GPIO GP

4) 5) ) ( 1 ) ( 2 ) ( 3 ) ( 4 ) ( 5 ) ( 6 ) )8) ( 1 ) ( 2 ) ( 3 ) ( 200 9) ( 10) 1 2 (

ア 接続 管理 ーバ ー GPS インター ッ S C バス位置情報 バス ー ータ ー バス運行情報 & ニ ース 1 S バス停 ー C コンセン ータ CATV/FTTH GPS Web 2.2 Linux GPS Linux GPS c 2015 Infor

3_23.dvi

c c SSIS SSIS 2001 LSI 2001 MIRAI NECASKA SELETE 21 5ISSCC LSI SSIS PR 60 70

(3.6 ) (4.6 ) 2. [3], [6], [12] [7] [2], [5], [11] [14] [9] [8] [10] (1) Voodoo 3 : 3 Voodoo[1] 3 ( 3D ) (2) : Voodoo 3D (3) : 3D (Welc

1: ( 1) 3 : 1 2 4

1 3DCG [2] 3DCG CG 3DCG [3] 3DCG 3 3 API 2 3DCG 3 (1) Saito [4] (a) 1920x1080 (b) 1280x720 (c) 640x360 (d) 320x G-Buffer Decaudin[5] G-Buffer D

Bull. of Nippon Sport Sci. Univ. 47 (1) Devising musical expression in teaching methods for elementary music An attempt at shared teaching

FA

Transcription:

NoC NoC 200MHz 4GHz 33.8mW NoC Packet Transfer Networks for 3-D Stacked Chips with Inductive Coupling Daisuke Sasaki, Hiroki Matsutani, Yasuhiro Take, Yuki Ono, Yukinori Nishiyama, Tadahiro Kuroda and Hideharu Amano Wireless chip-interconnect using inductive coupling, which enables us to stack know-gooddies after the chip fabrication, receives an attention with its high degree of flexibility and communication performance. To make the best use of the benefits, communication scheme between cores on different chips must be established. As the communication scheme for the wireless chip-interconnect, we propose a ring-based NoC with vertical bubble flow control and compare it with a ring-based NoC with virtual-channel flow control and a conventional vertical bus structure. These communication schemes are implemented on a real wireless 3-D IC, and they are evaluated in terms of the performance and area. Simulation results show that the prototype chip works at 200MHz. The wireless interconnect supports 4GHz double data rate transfer and consumes 33.8mW at average. The ring-based NoCs achieve a significantly higher throughput compared to the bus-based one. The ring-based NoC with vertical bubble flow outperforms that with conventional virtual-channel flow control in terms of the cost per performance. 1. Intellectual Property IP 1 System-on-a-Chip SoC IP Faculty of science and Technology, Keio University Graduate School of Information Science and Technology, The University of Tokyo SoC System-in-Package SiP 3 1) 399 c 2011 Information Processing Society of Japan

2) 3) 3 4)5) 6)7) SiP LSI CPU SoC LSI LSI LSI. 2 3 NoC Network-on-Chip NoC 4 2 5 3 6 2. 2 2) 3) 8) 4) SoC CMOS 8GHz 10 16 BER Bit Error Rate 7) 0.14pJ/bit 30µm 30µm I/O GPU DRAM 7) MuCCRA-Cube 6) 3 3. 1 1 400 c 2011 Information Processing Society of Japan

New chip Plane #0 Plane #0 1 2 (1) Shared bus (2) Ring network 1) 2) Plane #0 NoC 5 3 3.1 2 4 8 Plane #0 3 1-packet buffer space ( Occupied Empty) 4 To From To From To From 3.2 NoC 3 NoC 3 3.2.1 9) 2 dateline dateline 3.2.2 10)11) 401 c 2011 Information Processing Society of Japan

NoC 4 1 3 1 1 2 2 3 1 4 1 2 3 2 1 2 3 3 2 virtual cut-through VCT 5 4. 3 3 Cube-0 2 (2) Downlink (1) Control part (3) Uplink CK CK (4) Vertical bus CK Core0 CK Bus ctrl CK Core1 CK CK CK Bonding wires for power, clock, & chip ID On-chip router 35-bit data 2-bit credit 5 Cube-0 4.1 Cube-0 5 Cube-0 4 1) 2) NoC Downlink 3) Uplink 4) 2 Core 0 1 NoC 2 1 4.3 4.5 CK 4GHz 4) NoC 2) 3) Downlink Uplink NoC 4.2 NoC 4.2 3 NoC NoC point-to-point CK 4GHz CK RTL 402 c 2011 Information Processing Society of Japan

credit credit-base Uplink Downlink I/O 5 2 1 I/O NoC NoC 4 4 1 4.3 5 Cube-0 2 1 NoC 32-bit 3-bit 35bit 5 200MHz 45-bit 4.4 NoC 5 Cube-0 2 35-bit 3 16 1 200MHz 6 Cube-0 3 200MHz 4.5 8 1 8 4.6 Cube-0 Fujitsu e-shuttle 65nm CMOS 2.1mm Synopsys Design Compiler TOP IC Compiler Cadence Virtuoso 6 Cube-0 (1) (4) 5 7 Cube-0 4 Cube-0 403 c 2011 Information Processing Society of Japan

Network throughput [flits/cycle/core] 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 2-VC (15-flit) Bubble (15-flit) Uniform Neighbor Adversary Traffic patterns Vertical bus 2-VC (10-flit) 2-VC (15-flit) 2-VC (20-flit) 2-VC (30-flit) Bubble (15-flit) 7 Cube-0 4 8 4 1 Cube-0 Process technology Fujitsu CS202SZ 65nm Chip size 2.1mm 2.1mm System clock 200MHz # of ports 3 # of VCs 2 Router input buffer 16-flit FIFO for each VC Flit size 32-bit data + 3-bit control Packet size 5-flit Inductor for bubble Inductor for bus Inductor bandwidth 150µm 150µm 250µm 250µm 35 [bit/cycle/channel] 1 Cube-0 NoC 4 NoC 2.7 30µm 30µm 5) 5. Cube-0 Cube-0 5.1 Cube-0 Candence NC- Verilog RTL RTL 3 Uniform traffic N H = N/2 Network throughput [flits/cycle/core] 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 2-VC (15-flit) 9 Bubble (15-flit) Uniform Neighbor Adversary Traffic patterns Vertical bus 2-VC (10-flit) 2-VC (15-flit) 2-VC (20-flit) 2-VC (30-flit) Bubble (15-flit) 8 Neighbor traffic 1 H = 1 Adversary traffic H = N 1 5-flit 3 15-flit Bubble (15-flit) 2 2-VC (n-flit) (n/2)-flit 2-VC (15-flit) 0 10-flit 1 5-flit 8 9 4 8 Vertical bus 2-VC (n-flit) Bubble (15-flit) 2-VC (15-flit) Bubble (15-flit) 404 c 2011 Information Processing Society of Japan

11 10 NoC NoC Bubble (15-flit) 2- VC (15-flit) Neighbor traffic Bubble (15-flit) 15-flit 2-VC (30-flit) Verilog Verilog 200MHz 5.2 6 Cube-0 2 350µm 10 cb inputc outputc 3 inputc outputc 3 4.4 2 16 FIFO 1 inputc0 2 90% 3 cb 12 2 1 inputc 5.3 11 SPICE 12 SPICE 11,,. 4GHz 8Gbps NoC 12 1 33.8mW Cube-0 6. 405 c 2011 Information Processing Society of Japan

NoC NoC NoC Cube-0 Cube-0 200MHz Cube-0 Cube-0 Cube-1 MIPS 12) 21, ( ) 1) Davis, W. R., Wilson, J., Mick, S., Xu, J., Hua, H., Mineo, C., Sule, A. M., Steer, M. and Franzon, P. D.: Demystifying 3D ICs: The Pros and Cons of Going Vertical, IEEE Design and Test of Computers, Vol. 22, No. 6, pp. 498 510 (2005). 2) Ezaki, T., Kondo, K., Ozaki, H., Sasaki, N., Yonemura, H., Kitano, M., Tanaka, S. and Hirayama, T.: A 160Gb/s Interface Design Configuration for Multichip LSI, Proceedings of the International Solid-State Circuits Conference (ISSCC 04), pp. 140 141 (2004). 3) Burns, J., McIlrath, L., Keast, C., Lewis, C., Loomis, A., Warner, K. and Wyatt, P.: Three- Dimensional Integrated Circuits for Low-Power High-Bandwidth Systems on a Chip, Proceedings of the International Solid-State Circuits Conference (ISSCC 01), pp. 268 269 (2001). 4) Mizoguchi, D., Yusof, Y. B., Miura, N., Sakurai, T. and Kuroda, T.: A 1.2Gb/s/pin Wireless Superconnect Based on Inductive Inter- Chip Signaling (IIS), Proceedings of the International Solid-State Circuits Conference ( ISSCC 04), pp. 142 151 (2004). 5) Miura, N., Ishikuro, H., Sakurai, T. and Kuroda, T.: A 0.14pJ/b Inductive-Coupling Inter-Chip Data Transceiver with Digitally- Controlled Precise Pulse Shaping, Proceedings of the International Solid-State Circuits Conference (ISSCC 07), pp. 358 359 (2007). 6) Saito, S., Kohama, Y., Sugimori, Y., Hasegawa, Y., Matsutani, H., Sano, T., Kasuga, K., Yoshida, Y., Niitsu, K., Miura, N., Kuroda, T. and Amano, H.: MuCCRA-Cube: a 3D Dynamically Reconfigurable Processor with Inductive-Coupling Link, Proceedings of the Field-Programmable Logic and Applications (FPL 09), pp. 6 11 (2009). 7) Miura, N., Kasuga, K., Saito, M. and Kuroda, T.: An 8Tb/s 1pJ/b 0.8mm2/Tb/s QDR Inductive-Coupling Interface Between 65nm CMOS and 0.1um DRAM, Proceedings of the International Solid-State Circuits Conference (ISSCC 10), pp. 436 437 (2010). 8) Kanda, K., Antono, D. D., Ishida, K., Kawaguchi, H., Kuroda, T. and Sakurai, T.: 1.27-Gbps/pin, 3mW/pin Wireless Superconnect (WSC) Interface Scheme, Proceedings of the International Solid-State Circuits Conference (ISSCC 03), pp. 186 187 (2003). 9) Dally, W. J. and Towles, B.: Principles and Practices of Interconnection Networks, Morgan Kaufmann (2004). 10) Puente, V., Beivide, R., Gregorio, J. A., Prellezo, J. M., Duato, J. and Izu, C.: Adaptive Bubble Router: A Design to Improve Performance in Torus Networks, Proceedings of the International Conference on Parallel Processing (ICPP 99), pp. 58 67 (1999). 11) Abad, P., Puente, V., Prieto, P. and Gregorio, J. A.: Rotary Router: An Efficient Architecture for CMP Interconnection Networks, Proceedings of the International Symposium on Computer Architecture (ISCA 07), pp. 116 125 (2007). 12) Yuan, Y., Yoshida, Y., Yamagishi, N. and Kuroda, T.: Chip-to-Chip Power Delivery by Inductive Coupling with Ripple Canceling Scheme, Proceedings of the International Conference on Solid State Devices and Materials, pp. 502 503 (2007). 406 c 2011 Information Processing Society of Japan