Two dimensional FFT architecture based on radix-43 algorithm with efficient output reordering

In this paper we present a 64 × 64-point 2D FFT architecture using a parallel unrolled radix-4<sup>3</sup> (R4<sup>3</sup>) FFT as the basic block. Our R4<sup>3</sup> architecture is a memory optimized parallel architecture which computes 64-point FFT, with least execution time. Here we use row-column decomposition of two R4<sup>3</sup> blocks to compute a 2D FFT. Proposed architecture has been implemented in UMC 40nm CMOS technology with clock frequency of 500 MHz, area of 0.841mm<sup>2</sup> and power consumption of 358 mW. Computation time of 64 × 64 FFT is 8.19μs. ASIC results shows better performance of our FFT in terms of computation time when compared with state-of-art implementation.

[1]  Narayanan Vijaykrishnan,et al.  Bandwidth-intensive FPGA architecture for multi-dimensional DFT , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Chunming Zhang,et al.  Accelerating 2D FFT with Non-Power-of-Two Problem Size on FPGA , 2010, 2010 International Conference on Reconfigurable Computing and FPGAs.

[3]  Shuvra S. Bhattacharyya,et al.  Resource-efficient acceleration of 2-dimensional Fast Fourier Transform computations on FPGAs , 2009, 2009 Third ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC).

[4]  S. K. Nandy,et al.  High throughput, low latency, memory optimized 64K point FFT architecture using novel radix-4 butterfly unit , 2013, 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013).

[5]  S. K. Nandy,et al.  Design of a low power 64 point FFT architecture for WLAN applications , 2013, 2013 25th International Conference on Microelectronics (ICM).