Automated Scalable Address Generation Patterns for 2-Dimensional Folding Schemes in Radix-2 FFT Implementations

Hardware-based implementations of the Fast Fourier Transform (FFT) are highly regarded as they provide improved performance characteristics with respect to software-based sequential solutions. Due to the high number of operations involved in calculations, most hardware-based FFT approaches completely or partially fold their structure to achieve an efficient use of resources. A folding operation requires a permutation block, which is typically implemented using either permutation logic or address generation. Addressing schemes offer resource-efficient advantages when compared to permutation logic. We propose a systematic and scalable procedure for generating permutation-based address patterns for any power-of-2 transform size algorithm and any folding factor in FFT cores. To support this procedure, we develop a mathematical formulation based on Kronecker products algebra for address sequence generation and data flow pattern in FFT core computations, a well-defined procedure for scaling address generation schemes, and an improved approach in the overall automated generation of FFT cores. We have also performed an analysis and comparison of the proposed hardware design performance with respect to a similar strategy reported in the recent literature in terms of clock latency, performance, and hardware resources. Evaluations were carried on a Xilinx Virtex-7 FPGA (Field Programmable Gate Array) used as implementation target.

[1]  Lewis Johnson,et al.  Conflict free memory addressing for dedicated FFT hardware , 1992 .

[2]  S. K. Shome,et al.  Architectural design of a highly programmable Radix-2 FFT processor with efficient addressing logic , 2012, 2012 International Conference on Devices, Circuits and Systems (ICDCS).

[3]  Qian-Jian Xing,et al.  A Novel Conflict-Free Parallel Memory Access Scheme for FFT Processors , 2017, IEEE Transactions on Circuits and Systems II: Express Briefs.

[4]  Erdal Oruklu,et al.  An Efficient FFT Engine With Reduced Addressing Logic , 2008, IEEE Transactions on Circuits and Systems II: Express Briefs.

[5]  Keshab K. Parhi,et al.  An In-Place FFT Architecture for Real-Valued Signals , 2013, IEEE Transactions on Circuits and Systems II: Express Briefs.

[6]  Bingrui Wang,et al.  Design of Pipelined FFT Processor Based on FPGA , 2010, 2010 Second International Conference on Computer Modeling and Simulation.

[7]  R. W. Johnson,et al.  A methodology for designing, modifying, and implementing Fourier transform algorithms on various architectures , 1990 .

[8]  J. Tukey,et al.  An algorithm for the machine calculation of complex Fourier series , 1965 .

[9]  Manuel Jimenez,et al.  Design and implementation of a scalable floating-point FFT IP core for Xilinx FPGAs , 2010, 2010 53rd IEEE International Midwest Symposium on Circuits and Systems.

[10]  Franz Franchetti,et al.  Computer Generation of Hardware for Linear Digital Signal Processing Transforms , 2012, TODE.

[11]  Bin Wu,et al.  A Memory-Based FFT Processor Design With Generalized Efficient Conflict-Free Address Schemes , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[12]  Jianhao Hu,et al.  Hardware Efficient Mixed Radix-25/16/9 FFT for LTE Systems , 2015, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[13]  Jaakko Astola,et al.  Architecture-oriented regular algorithms for discrete sine and cosine transforms , 1996, Electronic Imaging.

[14]  Marshall C. Pease,et al.  An Adaptation of the Fast Fourier Transform for Parallel Processing , 1968, JACM.

[15]  Dionysios I. Reisis,et al.  Conflict free, parallel memory access for radix-2 FFT processors , 2012, 2012 19th IEEE International Conference on Electronics, Circuits, and Systems (ICECS 2012).

[16]  Kailash Chandra Ray,et al.  Hardware efficient design of Variable Length FFT Processor , 2011, 14th IEEE International Symposium on Design and Diagnostics of Electronic Circuits and Systems.

[17]  C. Loan The ubiquitous Kronecker product , 2000 .

[18]  Domingo Rodriguez,et al.  On tensor products formulations of additive fast fourier transform algorithms and their implementations , 1988 .

[19]  Mark Horowitz,et al.  Building Conflict-Free FFT Schedules , 2015, IEEE Transactions on Circuits and Systems I: Regular Papers.

[20]  M. Jimenez,et al.  An address generator approach to the hardware implementation of a scalable Pease FFT core , 2012, 2012 IEEE 55th International Midwest Symposium on Circuits and Systems (MWSCAS).

[21]  Pei-Yun Tsai,et al.  A Generalized Conflict-Free Memory Addressing Scheme for Continuous-Flow Parallel-Processing FFT Processors With Rescheduling , 2011, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[22]  Yunho Jung,et al.  Scalable FFT processor for MIMO-OFDM based SDR systems , 2010, IEEE 5th International Symposium on Wireless Pervasive Computing 2010.

[23]  Martin Margala,et al.  A Novel Coefficient Address Generation Algorithm for Split-Radix FFT (Abstract Only) , 2015, FPGA.

[24]  Jesús Grajal,et al.  A 4096-Point Radix-4 Memory-Based FFT Using DSP Slices , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[25]  Chao Yang,et al.  A New Memory Address Transformation for Continuous-Flow FFT Processors with SIMD Extension , 2015, NCCET.

[26]  Wei Cao,et al.  A permutation network for configurable and scalable FFT processors , 2011, 2011 9th IEEE International Conference on ASIC.

[27]  Sau-Gee Chen,et al.  A High-Throughput Radix-16 FFT Processor With Parallel and Normal Input/Output Ordering for IEEE 802.15.3c Systems , 2012, IEEE Transactions on Circuits and Systems I: Regular Papers.