NoC NoC 200MHz 4GHz 33.8mW NoC Packet Transfer Networks for 3-D Stacked Chips with Inductive Coupling Daisuke Sasaki, Hiroki Matsutani, Yasuhiro Take, Yuki Ono, Yukinori Nishiyama, Tadahiro Kuroda and Hideharu Amano Wireless chip-interconnect using inductive coupling, which enables us to stack know-gooddies after the chip fabrication, receives an attention with its high degree of flexibility and communication performance. To make the best use of the benefits, communication scheme between cores on different chips must be established. As the communication scheme for the wireless chip-interconnect, we propose a ring-based NoC with vertical bubble flow control and compare it with a ring-based NoC with virtual-channel flow control and a conventional vertical bus structure. These communication schemes are implemented on a real wireless 3-D IC, and they are evaluated in terms of the performance and area. Simulation results show that the prototype chip works at 200MHz. The wireless interconnect supports 4GHz double data rate transfer and consumes 33.8mW at average. The ring-based NoCs achieve a significantly higher throughput compared to the bus-based one. The ring-based NoC with vertical bubble flow outperforms that with conventional virtual-channel flow control in terms of the cost per performance. 1. Intellectual Property IP 1 System-on-a-Chip SoC IP Faculty of science and Technology, Keio University Graduate School of Information Science and Technology, The University of Tokyo SoC System-in-Package SiP 3 1) 399 c 2011 Information Processing Society of Japan
2) 3) 3 4)5) 6)7) SiP LSI CPU SoC LSI LSI LSI. 2 3 NoC Network-on-Chip NoC 4 2 5 3 6 2. 2 2) 3) 8) 4) SoC CMOS 8GHz 10 16 BER Bit Error Rate 7) 0.14pJ/bit 30µm 30µm I/O GPU DRAM 7) MuCCRA-Cube 6) 3 3. 1 1 400 c 2011 Information Processing Society of Japan
New chip Plane #0 Plane #0 1 2 (1) Shared bus (2) Ring network 1) 2) Plane #0 NoC 5 3 3.1 2 4 8 Plane #0 3 1-packet buffer space ( Occupied Empty) 4 To From To From To From 3.2 NoC 3 NoC 3 3.2.1 9) 2 dateline dateline 3.2.2 10)11) 401 c 2011 Information Processing Society of Japan
NoC 4 1 3 1 1 2 2 3 1 4 1 2 3 2 1 2 3 3 2 virtual cut-through VCT 5 4. 3 3 Cube-0 2 (2) Downlink (1) Control part (3) Uplink CK CK (4) Vertical bus CK Core0 CK Bus ctrl CK Core1 CK CK CK Bonding wires for power, clock, & chip ID On-chip router 35-bit data 2-bit credit 5 Cube-0 4.1 Cube-0 5 Cube-0 4 1) 2) NoC Downlink 3) Uplink 4) 2 Core 0 1 NoC 2 1 4.3 4.5 CK 4GHz 4) NoC 2) 3) Downlink Uplink NoC 4.2 NoC 4.2 3 NoC NoC point-to-point CK 4GHz CK RTL 402 c 2011 Information Processing Society of Japan
credit credit-base Uplink Downlink I/O 5 2 1 I/O NoC NoC 4 4 1 4.3 5 Cube-0 2 1 NoC 32-bit 3-bit 35bit 5 200MHz 45-bit 4.4 NoC 5 Cube-0 2 35-bit 3 16 1 200MHz 6 Cube-0 3 200MHz 4.5 8 1 8 4.6 Cube-0 Fujitsu e-shuttle 65nm CMOS 2.1mm Synopsys Design Compiler TOP IC Compiler Cadence Virtuoso 6 Cube-0 (1) (4) 5 7 Cube-0 4 Cube-0 403 c 2011 Information Processing Society of Japan
Network throughput [flits/cycle/core] 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 2-VC (15-flit) Bubble (15-flit) Uniform Neighbor Adversary Traffic patterns Vertical bus 2-VC (10-flit) 2-VC (15-flit) 2-VC (20-flit) 2-VC (30-flit) Bubble (15-flit) 7 Cube-0 4 8 4 1 Cube-0 Process technology Fujitsu CS202SZ 65nm Chip size 2.1mm 2.1mm System clock 200MHz # of ports 3 # of VCs 2 Router input buffer 16-flit FIFO for each VC Flit size 32-bit data + 3-bit control Packet size 5-flit Inductor for bubble Inductor for bus Inductor bandwidth 150µm 150µm 250µm 250µm 35 [bit/cycle/channel] 1 Cube-0 NoC 4 NoC 2.7 30µm 30µm 5) 5. Cube-0 Cube-0 5.1 Cube-0 Candence NC- Verilog RTL RTL 3 Uniform traffic N H = N/2 Network throughput [flits/cycle/core] 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 2-VC (15-flit) 9 Bubble (15-flit) Uniform Neighbor Adversary Traffic patterns Vertical bus 2-VC (10-flit) 2-VC (15-flit) 2-VC (20-flit) 2-VC (30-flit) Bubble (15-flit) 8 Neighbor traffic 1 H = 1 Adversary traffic H = N 1 5-flit 3 15-flit Bubble (15-flit) 2 2-VC (n-flit) (n/2)-flit 2-VC (15-flit) 0 10-flit 1 5-flit 8 9 4 8 Vertical bus 2-VC (n-flit) Bubble (15-flit) 2-VC (15-flit) Bubble (15-flit) 404 c 2011 Information Processing Society of Japan
11 10 NoC NoC Bubble (15-flit) 2- VC (15-flit) Neighbor traffic Bubble (15-flit) 15-flit 2-VC (30-flit) Verilog Verilog 200MHz 5.2 6 Cube-0 2 350µm 10 cb inputc outputc 3 inputc outputc 3 4.4 2 16 FIFO 1 inputc0 2 90% 3 cb 12 2 1 inputc 5.3 11 SPICE 12 SPICE 11,,. 4GHz 8Gbps NoC 12 1 33.8mW Cube-0 6. 405 c 2011 Information Processing Society of Japan
NoC NoC NoC Cube-0 Cube-0 200MHz Cube-0 Cube-0 Cube-1 MIPS 12) 21, ( ) 1) Davis, W. R., Wilson, J., Mick, S., Xu, J., Hua, H., Mineo, C., Sule, A. M., Steer, M. and Franzon, P. D.: Demystifying 3D ICs: The Pros and Cons of Going Vertical, IEEE Design and Test of Computers, Vol. 22, No. 6, pp. 498 510 (2005). 2) Ezaki, T., Kondo, K., Ozaki, H., Sasaki, N., Yonemura, H., Kitano, M., Tanaka, S. and Hirayama, T.: A 160Gb/s Interface Design Configuration for Multichip LSI, Proceedings of the International Solid-State Circuits Conference (ISSCC 04), pp. 140 141 (2004). 3) Burns, J., McIlrath, L., Keast, C., Lewis, C., Loomis, A., Warner, K. and Wyatt, P.: Three- Dimensional Integrated Circuits for Low-Power High-Bandwidth Systems on a Chip, Proceedings of the International Solid-State Circuits Conference (ISSCC 01), pp. 268 269 (2001). 4) Mizoguchi, D., Yusof, Y. B., Miura, N., Sakurai, T. and Kuroda, T.: A 1.2Gb/s/pin Wireless Superconnect Based on Inductive Inter- Chip Signaling (IIS), Proceedings of the International Solid-State Circuits Conference ( ISSCC 04), pp. 142 151 (2004). 5) Miura, N., Ishikuro, H., Sakurai, T. and Kuroda, T.: A 0.14pJ/b Inductive-Coupling Inter-Chip Data Transceiver with Digitally- Controlled Precise Pulse Shaping, Proceedings of the International Solid-State Circuits Conference (ISSCC 07), pp. 358 359 (2007). 6) Saito, S., Kohama, Y., Sugimori, Y., Hasegawa, Y., Matsutani, H., Sano, T., Kasuga, K., Yoshida, Y., Niitsu, K., Miura, N., Kuroda, T. and Amano, H.: MuCCRA-Cube: a 3D Dynamically Reconfigurable Processor with Inductive-Coupling Link, Proceedings of the Field-Programmable Logic and Applications (FPL 09), pp. 6 11 (2009). 7) Miura, N., Kasuga, K., Saito, M. and Kuroda, T.: An 8Tb/s 1pJ/b 0.8mm2/Tb/s QDR Inductive-Coupling Interface Between 65nm CMOS and 0.1um DRAM, Proceedings of the International Solid-State Circuits Conference (ISSCC 10), pp. 436 437 (2010). 8) Kanda, K., Antono, D. D., Ishida, K., Kawaguchi, H., Kuroda, T. and Sakurai, T.: 1.27-Gbps/pin, 3mW/pin Wireless Superconnect (WSC) Interface Scheme, Proceedings of the International Solid-State Circuits Conference (ISSCC 03), pp. 186 187 (2003). 9) Dally, W. J. and Towles, B.: Principles and Practices of Interconnection Networks, Morgan Kaufmann (2004). 10) Puente, V., Beivide, R., Gregorio, J. A., Prellezo, J. M., Duato, J. and Izu, C.: Adaptive Bubble Router: A Design to Improve Performance in Torus Networks, Proceedings of the International Conference on Parallel Processing (ICPP 99), pp. 58 67 (1999). 11) Abad, P., Puente, V., Prieto, P. and Gregorio, J. A.: Rotary Router: An Efficient Architecture for CMP Interconnection Networks, Proceedings of the International Symposium on Computer Architecture (ISCA 07), pp. 116 125 (2007). 12) Yuan, Y., Yoshida, Y., Yamagishi, N. and Kuroda, T.: Chip-to-Chip Power Delivery by Inductive Coupling with Ripple Canceling Scheme, Proceedings of the International Conference on Solid State Devices and Materials, pp. 502 503 (2007). 406 c 2011 Information Processing Society of Japan