A High-Flexible Low-Latency Memory-Based FFT Processor for 4G, WLAN, and Future 5G

A high-throughput programmable fast Fourier transform (FFT) processor is designed supporting 16- to 4096-point FFTs and 12- to 2400-point discrete Fourier transforms (DFTs) for 4G, wireless local area network, and future 5G. A 16-path data parallel memory-based architecture is selected as a tradeoff between throughput and cost. To implement a hardware-efficient high-speed processor, several improvements are provided. To maximally reuse the hardware resource, a reconfigurable butterfly unit is proposed to support computing including eight radix-2 in parallel, four radix-3/4 in parallel, two radix-5/8 in parallel, and a radix-16 in one clock cycle. Twiddle factor multipliers using different schemes are optimized and compared, wherein modified coordinate rotation digital computer scheme is finally implemented to minimize the hardware cost while supporting both FFTs and DFTs. An optimized conflict-free data access scheme is also proposed to support multiple butterflies at any radices. The processor is designed as a general IP and can be implemented using a processor synthesizer (application-specific instruction-set processor designer). The electronic design automation synthesis result based on a 65-nm technology shows that the processor area is 1.46 mm2. The processor supports 972 MS/s 4096-point FFT at 250 MHz with a power consumption of 68.64 mW and a signal-to-quantization-noise ratio of 66.1 dB. The proposed processor has better-normalized throughput per area unit than the state-of-the-art available designs.

[1]  Chu Yu,et al.  Area-Efficient 128- to 2048/1536-Point Pipeline FFT Processor for LTE and Mobile WiMAX Systems , 2015, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[2]  Hui-Wen Cheng,et al.  Multimode Memory-Based FFT Processor for Wireless Display FD-OCT Medical Systems , 2014, IEEE Transactions on Circuits and Systems I: Regular Papers.

[3]  Oscar Gustafsson,et al.  Unified architecture for 2, 3, 4, 5, and 7-point DFTs based on Winograd Fourier transform algorithm , 2013 .

[4]  Yi-Jun Liu,et al.  Efficient Memory-Addressing Algorithms for FFT Processor Design , 2015, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[5]  Abbes Amira,et al.  Efficient FPGA Implementation of High-Throughput Mixed Radix Multipath Delay Commutator FFT Processor for MIMO-OFDM , 2017 .

[6]  Bevan M. Baas,et al.  A low-power, high-performance, 1024-point FFT processor , 1999, IEEE J. Solid State Circuits.

[7]  Keshab K. Parhi,et al.  P-CORDIC: A Precomputation Based Rotation CORDIC Algorithm , 2002, EURASIP J. Adv. Signal Process..

[8]  Jianhao Hu,et al.  Hardware Efficient Mixed Radix-25/16/9 FFT for LTE Systems , 2015, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[9]  Sau-Gee Chen,et al.  A High-Throughput Radix-16 FFT Processor With Parallel and Normal Input/Output Ordering for IEEE 802.15.3c Systems , 2012, IEEE Transactions on Circuits and Systems I: Regular Papers.

[10]  Liu Dake Baseband ASIP design for SDR , 2015, China Communications.

[11]  Chen-Yi Lee,et al.  A 2.4-Gsample/s DVFS FFT Processor for MIMO OFDM Communication Systems , 2008, IEEE Journal of Solid-State Circuits.

[12]  Jong-Yeol Lee,et al.  Low complexity twiddle factor multiplication with ROM partitioning in FFT processor , 2013 .

[13]  Mark Horowitz,et al.  Building Conflict-Free FFT Schedules , 2015, IEEE Transactions on Circuits and Systems I: Regular Papers.

[14]  Xin-Yu Shih,et al.  48-Mode Reconfigurable Design of SDF FFT Hardware Architecture Using Radix-32 and Radix-23 Design Approaches , 2017, IEEE Transactions on Circuits and Systems I: Regular Papers.

[15]  Jesús Grajal,et al.  Efficient Memoryless Cordic for FFT Computation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[16]  Sang Yoon Park,et al.  Fixed-Point Analysis and Parameter Selections of MSR-CORDIC With Applications to FFT Designs , 2012, IEEE Transactions on Signal Processing.

[17]  Florent de Dinechin,et al.  Improving Energy Efficiency of OFDM Using Adaptive Precision Reconfigurable FFT , 2017, Circuits Syst. Signal Process..

[18]  B. Lakshmi,et al.  High speed architectural implementation of CORDIC algorithm , 2008, TENCON 2008 - 2008 IEEE Region 10 Conference.

[19]  Hanho Lee,et al.  High-throughput Low-complexity Mixed-radix FFT Processor using a Dual-path Shared Complex Constant Multiplier , 2017 .

[20]  Dejan Markovic,et al.  Power and Area Minimization of Reconfigurable FFT Processors: A 3GPP-LTE Example , 2012, IEEE Journal of Solid-State Circuits.

[21]  Peng Wang,et al.  Software defined FFT architecture for IEEE 802.11ac , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[22]  Alvin M. Despain,et al.  Fourier Transform Computers Using CORDIC Iterations , 1974, IEEE Transactions on Computers.

[23]  An-Yeu Wu,et al.  Mixed-scaling-rotation CORDIC (MSR-CORDIC) algorithm and architecture for high-performance vector rotational DSP applications , 2005, IEEE Transactions on Circuits and Systems I: Regular Papers.

[24]  Pei-Yun Tsai,et al.  A Generalized Conflict-Free Memory Addressing Scheme for Continuous-Flow Parallel-Processing FFT Processors With Rescheduling , 2011, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[25]  Chen-Yi Lee,et al.  A Generalized Mixed-Radix Algorithm for Memory-Based FFT Processors , 2010, IEEE Transactions on Circuits and Systems II: Express Briefs.

[26]  An-Yeu Wu,et al.  VLSI Design of a Variable-Length FFT/IFFT Processor for OFDM-Based Communication Systems , 2003, EURASIP J. Adv. Signal Process..

[27]  Song-Nien Tang,et al.  An Area- and Energy-Efficient Multimode FFT Processor for WPAN/WLAN/WMAN Systems , 2012, IEEE Journal of Solid-State Circuits.

[28]  Bin Wu,et al.  A Memory-Based FFT Processor Design With Generalized Efficient Conflict-Free Address Schemes , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[29]  Javier D. Bruguera,et al.  High Performance Rotation Architectures Based on the Radix-4 CORDIC Algorithm , 1997, IEEE Trans. Computers.

[30]  Chin-Long Wey,et al.  Low-cost parallel FFT processors with conflict-free ROM-based twiddle factor generator for DVB-T2 applications , 2013, 2013 IEEE 56th International Midwest Symposium on Circuits and Systems (MWSCAS).

[31]  Song-Nien Tang,et al.  A 2.4-GS/s FFT Processor for OFDM-Based WPAN Applications , 2010, IEEE Transactions on Circuits and Systems II: Express Briefs.

[32]  Qian-Jian Xing,et al.  A Novel Conflict-Free Parallel Memory Access Scheme for FFT Processors , 2017, IEEE Transactions on Circuits and Systems II: Express Briefs.

[33]  Sau-Gee Chen,et al.  A new memoryless and low-latency FFT rotator architecture , 2014, 2014 International Symposium on Integrated Circuits (ISIC).

[34]  Xin-Yu Shih,et al.  VLSI Design and Implementation of Reconfigurable 46-Mode Combined-Radix-Based FFT Hardware Architecture for 3GPP-LTE Applications , 2018, IEEE Transactions on Circuits and Systems I: Regular Papers.

[35]  Tughrul Arslan,et al.  Scheme for reducing size of coefficient memory in FFT processor , 2002 .