Fast Fourier transform (FFT) is one of the fundamental processing block used in many signal processing applications (i.e. for orthogonal frequency division multiplexing in wireless telecommunication). Therefore, every proposal to reduced latency, resources or accuracy errors of FFT implementation counts. This paper proposes the implementation of the butterfly processing elements (BPE) where the concept of the radix-r butterfly computation has been formulated as the combination of α radix-2 butterflies implemented in parallel. An efficient FFT implementation is feasible using our proposed multiplexed and pipelined BPE. Compared to a state-of-the-art reference based on pipelined and parallel structure FFTs, and FPGA based implementation reveals that the maximum throughput is improved by a factor of 1.3 for a 256-point FFT and reach a throughput of 2680 MSps on Virtex-7. The analysis extends to touch on key performance measurements metrics such as throughput, latency and resource utilization.
[1]
Daniel Massicotte,et al.
Radix-2α/4β Building Blocks for Efficient VLSI's Higher Radices Butterflies Implementation
,
2014,
VLSI Design.
[2]
Jesús Grajal,et al.
Pipelined Radix-$2^{k}$ Feedforward FFT Architectures
,
2013,
IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[3]
Daniel Massicotte,et al.
A new FFT concept for efficient VLSI implementation: Part I - Butterfly processing element
,
2009,
2009 16th International Conference on Digital Signal Processing.
[4]
Daniel Massicotte,et al.
A novel approach for FFT data reordering
,
2010,
Proceedings of 2010 IEEE International Symposium on Circuits and Systems.
[5]
Earl E. Swartzlander,et al.
FFT Implementation with Fused Floating-Point Operations
,
2012,
IEEE Transactions on Computers.