2008/1/15 (12) 1
2008/1/15 (12) 2 (12) http://ssc.pe.titech.ac.jp
2008/1/15 (12) 3 VLSI 100W P d f clk C V 2 dd I I I leak sub g = I sub + I g qv exp nkt exp ( 5. 6V 10T 2. 5) gd T V T ox Gordon E. Moore, ISSCC 2003.
2008/1/15 (12) 4 CMOS LSI
2008/1/15 (12) 5 CMOS LSI LSI
2008/1/15 (12) 6 LSI LSI LSI
2008/1/15 (12) 7 ALU
2008/1/15 (12) 8 D C B A
2008/1/15 (12) 9 CMOSV dd V dd T = 1 f clk A) V dd P d : F/F F/F B) V dd P d : T C) V dd P d :
2008/1/15 (12) 10
2008/1/15 (12) 11 LSI
2008/1/15 (12) 12 LSI LSI/!! CPU DSP Dedicated LSI 450 50 Clock frequency (MHz) 25 # of operations/clock Operating speed (GOPS) Pd (mw) 7000 110 2 16 96 0.9 0.8 2.4 12 Pd (mw) Operating speed (GOPS) 7800 5 138 Pd/GOPS: 3 orders
2008/1/15 (12) 13 Multi-core processor Intel 2 CPU (1+8) CPU IBM, Sony, Toshiba NEC 3 CPU Fujitsu 8W VLIW
DSP DATA MEMORY Data ROM da AMA PROGRAM CONTROL IR INST ROM Data RAM Double Access AMB PU sp DEC STACK cc IP I/O DMA CONT SERIAL PARALLEL M BUS 16 A BUS 16 B BUS 16 EXT CLK Special memory scheme to realize double speed MAC PLL CLK GEN DSP-CORE DATA REGS Viterbi accelerator ALU Dedicated MAC unit Double speed MAC scheme Redundant binary number 2008/1/15 (12) 14 SAT BSFT ACS DPU RB-MAC ACC MAC
2008/1/15 (12) 15
2008/1/15 (12) 16 ALU ACS PM0(t-1) BMa(t) PM1(t-1) BMb(t) PM0(t-1) BMa(t) BMb(t) PM0(t) Add Upper 8-bits ALU Lower 8-bits Compare PM1(t-1) COMPARATOR REG PM0(t) = min[(pm0(t-1)+bma(t)), (PM1(t-1)+BMb(t))] wo Adds, one Compare and one Select -> ACS operation SHIFT REG Select - Normal operation: The ALU is used as a 16-bit processing unit. - ACS operation: The ALU is used as two 8-bit adders
2008/1/15 (12) 17 33% Comparison of the number of clock cycles needed to realize [%] 100 an 11.2kbps VSELP CODEC. Clock Number Ratio 80 60 40-9.0% -4.7% -8% - 11.4% Total: - 33.1% Misc Block Floating Error Correction MAC 20 0 DSP w/o MAC & Viterbi Accelerators ALU DSP w/ MAC & Viterbi Accelerators
2008/1/15 (12) 18 SoC MPEG4 Codec 0.18um e-dram 31M Tr 90 mw@54mhz T. Hashimoto, et al., A 90mW MPEG4 Video Codec LSI with the Capability for Core Profile, ISSCC, Dig. of Tech. Papers, pp. 140-141, 2001. MPEG4 Decoder 0.18um CMOS 11M Tr 11 mw@27/54mhz 15fps (Core@L1 decode) 30 fps (Simple@L3 decode) 15fps (Core@L1 decode) M. Ohashi, et al., A 27MHz 11.1 mw MPEG4 Video Decoder LSI for Mobile Application, ISSCC, Dig. of Tech. Papers, pp. 366-367, 2002.
2008/1/15 (12) 19 MPEG4 LSI DSP VCE (Video Codec Engines) ME VLC DCT VLD PNR PAD CAD COMP LM LM IDCT LM LM LM Programmable DSP Inst. DSP Core Mem Data Mem HIF (Host I/F) MIF (Memory I/F) DRAM (2Mb) Main Filter Sub Graph. DRAM (2Mb) DRAM (16Mb) Video Input VPU(Video Processing Unit) Video Output
2008/1/15 (12) 20 ALUALU
2008/1/15 (12) 21 Performance for Core Decoding Codec Decoding Performance : 5fps 20fps HW Engine Software CAD PAD 6.1% 26.5% COMP Texture Decoding 6.8% 63% 0 5 Kcycles 40 Core@L1 Decoding WITH the Engines 0 24% WITHOUT the Engines 100 Mcycles 200
2008/1/15 (12) 22 Sakiyama et al., Symp. On VLSI Circuits 97 Adaptive supply voltage control circuits 0.35umCMOS 2.2M Tr 20MIPS 12mW (1.2V, internal) Ieak current 500uA: active 1uA: standby
2008/1/15 (12) 23 150 Fixed supply voltage 100 50 V tn =0.6V V tp =-0.7V 1 5 (3.0V) 1 3 1 2 1.7V 2.1V (1.0V-- 2.2V) V tn =0.30V V tp =-0.33V 0 1.2V 10 20 30 40 50
2008/1/15 (12) 24 Internal supply voltage External supply voltage (Critical path) Replica Control pulse Feedback loop Clock Replica delay Phase comparison
2008/1/15 (12) 25 60 Clock period (ns) 50 40 30 20 Fail Pass Minimum margin: 25mV Maximum margin: 100mV 10 1.0 1.2 1.4 1.6 1.8 2.0 Internal supply voltage (V)
2008/1/15 (12) 26 DC/DC 94% 15mVpp. S.Sakiyama et al., ISSCC99 High Noise Chip Inductor Conventional Improved Low Noise Choke Coil Low Noise Chip Inductor 300mV 15mV
2008/1/15 (12) 27 DC/DC Load Waveforms V = L di dt V i V L o T on = V L o T off = I T I on Vo = V V = ( T ) in on + T off Ton + T 8C off