24 FFT Self-Timeed Pipeline Implementation of Adaptive FFT for Different Rate Signals 1155064 2013 2 5
FFT HetNet (FFT) FFT. (STP) FFT STP FFT FFT FFT FPGA(Altera stratixii) HetNet FFT STP i
Abstract Self-Timeed Pipeline Implementation of Adaptive FFT for Different Rate Signals Hajime ooiso In recent years, such as seen in HetNet, there is an increasing demand for heterogeneous wireless communication system that allows mobile terminals to choose the best wireless interface among available ones. In such terminals, multiple signal sequences of different interfaces have to be processed in parallel. However, in the conventional clock-synchronized digital circuit, it is difficult to design adaptive scheduling for various combinations of the interfaces. Therefore, this study focuses on the self-timed pipeline (STP) circuit, which inherently has the clockless passive operation mode without clockbased scheduling, and implements an experimental STP circuit for fast Fourier transform (FFT), which is one the heaviest functions in modern wireless signal processing. In the self-timed FFT circuit, radix-r butterfly operations in Cooley-Tukey FFT algorithm are continuously invoked based on stream-oriented parallel processing. The number of invoked butterfly operations is controlled by color identifiers that we introduced as abstraction of available processing resource constrained by concurrently processed wireless signal sequences. As a result of FPGA implementation, it is confirmed that the proposed circuit for radix-2 1024-point FFT performs well under available hardware resource. key words Signal processing, Heterogeneous wireless system, Fast Fourier transform, Self-timed pipeline ii
1 1 2 FFT 4 2.1...................................... 4 2.2 FFT................ 4 2.3 FFT................................ 7 2.3.1....................... 8 2.3.2 input str................................ 11 2.3.3 Tf index................................ 12 2.3.4 Tf lookup............................... 14 2.3.5 stream gen.............................. 15 2.3.6 Butterfly............................... 16 2.3.7 result str............................... 17 2.3.8 next.................................. 17 2.4...................................... 20 3 FFT 21 3.1...................................... 21 3.2............................. 21 3.3................................. 24 3.3.1 MEM................................ 25 3.3.2................................. 27 3.3.3........................... 30 3.4................................... 31 iii
3.5...................................... 37 4 38 42 43 iv
2.1 Npoint FFT.............................. 5 2.2 Npoint FFT................ 6 2.3 FFT....................... 7 2.4 8 FFT....................... 8 2.5 FFT....................... 9 2.6 in place.......................... 10 2.7 input str........................... 11 2.8 Tf index........................... 13 2.9 Tf lookup.......................... 14 2.10................................ 15 2.11 stream gen.......................... 16 2.12 Butterfly........................... 17 2.13 next 1............................. 18 2.14 next 2............................. 19 3.1 FFT....................... 25 3.2 Mem............................. 26 3.3 Radix4...................... 27 3.4 SelectStep.......................... 29 3.5 SameStep.......................... 30 3.6 FFT................ 32 3.7 opr = 01.......................... 34 3.8 opr = 10.......................... 35 v
4.1 double buffer....................... 41 vi
1.1 LAN............................. 2 1.2.......................... 2 3.1................................... 23 3.2 LTE FFT........................ 24 3.3 opr mem....................... 26 3.4............................. 36 3.5 FPGA...................... 36 vii
1 PC 2 1 LAN LAN 2 3 (3G) 3.9G 3.9G 4G 4G 1.1 LAN 1.2 1
IEEE 802.11n IEEE 802.11ac IEEE 802.11ad OFDM OFDM OFDM 40MHz 160MHz 9GHz 64QAM 256QAM 64QAM 600Mbps 6.9Gbps 6.8Gbps 1.1 LAN W-CDMA(3G) LTE(3.9G) LTE-Advanced(4G) DL : CDMA DL : OFDMA DL : OFDMA UL : CDMA UL : SC-FDMA UL : SC-FDMA 5MHz 20MHz 100MHz HPSK, QPSK 16QAM, 64QAM 16QAM, 64QAM DL : 384kbps DL : 300Mbps DL : 3Gbps UP : 64kbps UP : 75Mbps UP : 1.5bps 1.2 FFT FFT STP Self-timed Pipeline) FFT FPGA(Altera StratixII) 2 FFT 2
FFT 3 FFT STP FPGA 4 3
2 FFT 2.1 FFT FFT FFT FFT FFT FFT 2.2 FFT FFT FFT N f s 1 x 1 FFT 2.1 x 1 1 f s 1 x 1(0) 1 f s 1 N 1 f s 1 x 1(N 1) x 1 FFT F F T fs1 1 N f s 1 2 FFT F F T fs12 x 1 (N) F F T fs1 1 F F T fs1 2 4
2.2 FFT 2.1 Npoint FFT x 1 (N) 1 f s 1 x 1(N 1) 1 f s 1 F F T fs11 1 FFT FFT FFT x 1 (N 1) FFT 1 f s 1 FFT x 1 (N 1) x 1 x 1 (N 1) FFT x 1 (N) 2 2.2 F F T fs1 1 x 1 (0) N 2f s 1 1 2f s 1 x 1(N 1) N 1 f s 1 F F T fs12 x 1 (N) 3N 2f s 1 x 1(N 1) x 1 (N) N+1 2f s 1 FFT FFT FFT 5
2.2 FFT 2.2 Npoint FFT 2 x 1 x 2 FFT FFT N x 1 x 2 f s 1 f s 2 f s 1 = 2 f s 2 x 1 x 2 N+1 2f s 1 N+1 f s 1 FFT FFT FFT FFT FFT FFT STP (C ) 6
2.3 FFT 2.3 FFT 2.3 FFT Desecrate Fourier Transform : DFT) f(n) F (k) DFT F (K) = N 1 n=0 f(n)w kn N (2.1) W n = e j 2 N (2.2) W 1 N 2.1 2.2 DFT N 2 N(N 1) DFT FFT FFT 2 Cooley-Tukey FFT FFT N 2 log 2N Nlog 2 N 2.1 2.2 N = 8 FFT 2.4 2.4 2 7
2.3 FFT FFT FFT R FFT N log R N N R N R FFT FFT FFT FFT [2] FFT 2.4 8 FFT 2.3.1 [2] 8
2.3 FFT R 2.5 color color color 2.5 FFT input str 1 N(R 1) R (Base Address : BA) BA color BA color max color 9
2.3 FFT stream gen BA Tf index Tf lookup Tf index Butterfly stream gen Tf lookup R result str R BA stream gen next FFT BA next FFT 2.6 in place 10
2.3 FFT in place FFT in place FFT 4 FFT 16 in place 2.6 BA BA skip skip N R step+1 4 BA BA + skip BA + 2skip BA + 3skip 2.3.2 input str 2.7 input str input str 2.7 11
2.3 FFT input str DATA Address Address FFT log 2 N 1 DATA 16bit 16bit 32bit store color color source 4 BA skip BA BA + skip BA + 2skip BA + 3skip BA + 3skip 3skip BA color source 0 input str BA 1step 4 BA 0 N 4 color source BA color source BA color BA color source index 0 color source color max color max color source color source color max color source 2.3.3 Tf index Tf index 2.8 BA Tf index input str log 2 color max color 2 index log 2 N 12
2.3 FFT 2.8 Tf index BA BA index 2 index0 index0 BA 0 0 BA 0 index1 index 1 BA index2 index3 BA index sterilizer (ser ) ser 4 color 4 index 13
2.3 FFT 2.3.4 Tf lookup 2.9 Tf lookup Tf lookup 2.9 Tf lookup Tf index Tf lookup 2.10 x y 2.10 2 1 j 3 1 4 j 2 Tf look Tf index 2 14
2.3 FFT FFT N 4 N 4 j 1 2.10 2.3.5 stream gen stream gen 2.11 stream gen input str stream gen index0 BA index skip skip ser ser Tf index index 15
2.3 FFT 2.11 stream gen LD LD 2.3.6 Butterfly Butterfly 2.12 Butterfly stream gen Tf lookup 2 Butterfly 4 complexmul 2 16bit Deserializer (des ) des index index ser 16
2.3 FFT 2.12 Butterfly 2.3.7 result str result str stream gen Butterfly 2 str str index index 2.3.8 next next result str next BA FFT BA step BA next 1 2.13 step next 2 2.14 next 1 result str index BA BA 17
2.3 FFT 2.13 next 1 BA BA index 0 index0 BA 0 step step BA 0 next 1 BA 0 step step step BA BA BA color max BA color max FFT 1 step N N R step nextskip(ns) BA 18
2.3 FFT 2.14 next 2 BA ns BA BA (BA mod skip) ns BA BA FFT N N step BA N step next 2 New step next 2 New step next 1 step step step step 19
2.4 log 4 N 1 color FFT FFT reorder step skip ns skip ns BA 2.4 FFT FFT FFT FFT FFT FPGA(Altera StratixII) 20
3 FFT 3.1 2 FFT FFT FFT FFT color color FFT color FFT FFT 3.2 FFT FFT STP 21
3.2 STP FFT FFT FFT 1 FFT 2 FFT f s FFT N FFT N+1 2f s FFT FFT N 2 log2n Nlog2N FFT FFT step step step N R FFT FFT FFT color color color FFT (color max ) color max ASIC FPGA ASIC FPGA FPGA 22
3.2 FFT Altera stratix II 16 1024 4 color max 2 ( ) 32 bit (16bit 16bit) 3.1 I/O ASIC FPGA ASIC FPGA ASIC FPGA 3.1 FPGA Altera stratix II FFT FFT FFT FFT 4 FPGA color color max 23
3.3 (MHz) 1.25 2.5 5 10 15 20 (MHz) 1.92 3.86 7.68 15.36 23.04 30.72 FFT 128 256 512 1024 1536 2048 3.2 LTE FFT 2 FFT 16 1024 4 16 color max 2 FFT 1024 LTE 4 FFT LTE FFT 3.2 LTE 20MHz FFT 2048 FFT 4 4 10MHz FFT 16 1024 2 3.3 STP 3.1 STP input str stream gen result str color 24
3.3 3 mem 3 2 1 7 5 Marge 13 3.1 FFT 3.3.1 MEM MEM 3.2 (opr) (Initial address) (Initial data), (Stream gen address) (Radix4buterfly address) (Radix4butterfly result) FFT 25
3.3 opr 00 01 10 11 FFT 3.3 opr mem (Reorder address) (Tf index address mem opr / opr mem 3.3 opr / 3.2 Mem 26
3.3 3.3.2 Input str Radix4 Next Input str 1 Input str Mem Mem color BA. BA color max color BA BA color color max color max 3.3 Radix4 Radix4 complex sync addition 27
3.3 3.3 complex sync 4 4 stream gen index index sync index addition 4 addition -1 addition 4 FFT N FFT 1-1 N -N bit FFT 1-1 1-1 bit Next SelectStep SameStep NextStep Next FFT BA SelectStep 3.4 SelectStep FFT step step register BA step BA 0 BA=0 register 28
3.3 3.4 SelectStep step SameStep C cp SameStep step BA Stream gen skip SameStep 3.5 BA BA color max color max BA step R step N R step BA BA N R step NextStep step 29
3.3 3.5 SameStep BA step skip FFT SameStep BA skip BA skip NextStep BA SelectStep renewstep BA skip renewstep opr reorder 3.3.3 Tf index Stream gen Result str Reorder.Tf index Stream gen Stream gen 30
3.4 Tf index Stream gen BA skip BA Next BA Input str BA opr Tf index (C-element with N-count Copy: CNC) Tf index Stream gen Result str MEM 1 1 Reorder FFT FFT BA index Result str Reorder Serialize pre Serialize post Serialize pre / 1 / index index CNC / index Serialize post opr 3.4 FFT (3.1) ( (R+S)N R S 1 S + S 2 N ) log color R N T f N (3.1) max f s 31
3.4 FFT N FFT R Tf S S 1 R S 2 1 log R N step 1step, 3.6 FFT FFT 3.6 Mem FFT 1 R Serialize pre stage 1 R Radix4 sync stage Serialize pre stage 32
3.4 1 serialize pre stage Radix4 sync stage Serialize pre stage (S 1 ) 6 Serialize pre stage Radix4 sync stage (S 2 ) 7 MEM 3 opr Radix4 sync stage Mem Radix4Butterfly Radix4 sync stage Input str Next Input str Next FFT Radix4Butterfly opr 01 3.7 opr 01 Mem opr Tf lndex Stream gen 01 Tf lndex Stream gen Mem Mem Radix4Butterfly Tf lndex Stream gen Radix4Butterfly Result str opr 01 Serialize pre stage 33
3.4 Serialize pre stage Serialize pre stage 4 R + S 1step FFT 1step (R + S)N S 1 (R+S)N R S 1 S 3.7 opr = 01 opr 10 opr 10 3.8 opr Result str 10 Result str Radix4butterfly Mem Mem Next reorder BA Tf index Stream gen 34
3.4 3.8 opr = 10 opr 10 Radix4butterfly 1 1 1step S 2 N 2 opr 1 1step (R+S)N R S 1 S + S 2 N color max (R+S)N R color max S 1 S + S 2 N 3.4 Serialize pre stage 5ns 20 ns Tf STP C STP 35
3.4 (ns) Merge stage 7.45 MEM stage 14.4 Input Str stage 12.5 complex stage 20.2 sync stage 14.8 addtion stage 18.8 SelectStep stage 13.8 SameStep stage 14.3 NextStep stage 13.8 Strean gen stage 14.6 Tf index stage 14.2 Serialize pre stage 20.0(5 4) Serialize post stege 10.1 3.4 (ns) 424369 FPGA 448556 3.5 FPGA FFT N FPGA 3.5 94% FFT 2.4 MHz ASIC 36
3.5 40nm ASIC 25 MHz 3.2 40nm ASIC 10 MHz LTE FFT 3.5 FFT Altera FPGA StratixII FFT 1024 94% 40 nm ASIC 10 MHz LTE FFT 37
4 PC 3G 3.9G LAN FFT FFT FFT FFT FFT N f s FFT N f s FFT FFT FFT FFT FFT 38
STP STP C FFT FFT step N R FFT FFT FFT color FFT STP 2 13 1 7 5 Marge FPGA stratixii 96% 40 nm ASIC FFT N 25 MHz FFT 10 MHz LTE 4G G HetNet 39
FFT STP Mem 1 1 Mem Mem in place step skip double buffer [2] 4 FFT 16 FFT double buffer 4.1 double buffer step 4 step in place double buffer write after read 40
4.1 double buffer step wirte after read double buffer step N N 4 double buffer in place 41
. 2 4 42
[1] H. Terada, et al, DDMP s: Self-Timed Super-Pipelined Data-Driven Multimedia Processors, Proceedings of the IEEE, 87(2), pp.282 296, Feb. 1999. [2] FFT KUT 2013. [3] KUT 2002. [4] 95 2007 2 [5] FUJITSU.63,6,p.681-688(11,2012) [6] S. He, et al., A new approach to pipeline FFT processor, Int.Parallel Processing Symposium, 1996. [7] S. He, et al., Design and Implementation of a 1024-point Pipeline FFT Processor, Custom Integrated Circuits Conf., 1998. [8] Y. Lin, et al., A Dynamic Scaling FFT Processor for DVB-T Applications, IEEE J. Solid-State Circuits, Nov. 2004. [9] Y. Lin, et al., A 1-GS/s FFT/IFFT Processor for UWB Applications, IEEE J. Solid-State Circuits, Aug. 2005. [10] Y. Chen, et al., A 2.4-Gsample/s DVFS FFT Processor for MIMO OFDM Communication Systems, IEEE J. Solid-State Circuits, May. 2008. 43