Reduced Logic and Low-Power FFT Architectures for Embedded Systems

Discrete Fourier Transform (DFT) is one of the core operations in digital signal processing and communication systems. Many fundamental algorithms can be realized by DFT, such as convolution, spectrum estimation, and correlation. Furthermore, DFT is widely used in standard embedded system applications such as wireless communication protocols requiring Orthogonal Frequency Division Multiplexing (Wey et al., 2007), and radar image processing using Synthetic Aperture Radar (Fanucci et al., 1999). In practice, DFT is difficult to implement directly due to its computational complexity. To reduce the degree of computation, Cooley and Tukey proposed the well-known Fast Fourier Transform (FFT) algorithm, which reduces the calculation of N-point DFT from O(N2) to O(N/2log2N). (Proakis & Manolakis, 2006). Nevertheless, for embedded systems, in particular portable devices; efficient hardware realization of FFT with small area, low-power dissipation and real-time computation is a significant challenge. The challenge is even more pronounced when FFTs with large transform lengths (>1024 points) need to be realized in embedded hardware. Therefore, the objective of this research is to investigate hardware efficient FFT architectures, emphasizing compact, low-power embedded realizations. As VLSI technology evolves, different architectures have been proposed for improving the performance and efficiency of the FFT hardware. Pipelined architectures are widely used in FFT realization (Li & Wanhammar, 1999; He & Torkelson, 1996; Hopkinson & Butler, 1992; Yang et al., 2006) due to their speed advantages. Higher radix (Hopkinson & Butler, 1992; Yang et al., 2006) and multi-butterfly (Bouguezel et al., 2004; X. Li et al., 2007) structures can also improve the performance of the FFT processor significantly, but these structures require substantially more hardware resources. Alternatively, shared memory based schemes with a single butterfly calculation unit (Cohen, 1976; Ma, 1994, 1999; Ma & Wanhammar, 2000; Wang et al., 2007) are preferred in many embedded FFT processors since they require least amount of hardware resources. Furthermore, “in-place” addressing strategy is a practical choice to minimize the amount of data memory. With “in-place” strategy, the two outputs of the butterfly unit can be written back to the same memory locations of the two inputs, and replace the old data. For in-place FFT processing, two data read and two data write operations occur at every clock cycle. Multiple memory banks and conflict-free addressing logic are required to realize four data accesses in one clock cycle. Consequently, a typical FFT processor is composed of three major components: i) butterfly calculation units, ii) conflict free address generators for both data and coefficient accesses and iii) multi-bank memory units.

[1]  Jack E. Volder The CORDIC Trigonometric Computing Technique , 1959, IRE Trans. Electron. Comput..

[2]  Xiaojin Li,et al.  A Low Power and Small Area FFT Processor for OFDM Demodulator , 2007, IEEE Transactions on Consumer Electronics.

[3]  Mats Torkelson,et al.  A new approach to pipeline FFT processor , 1996, Proceedings of International Conference on Parallel Processing.

[4]  M. Omair Ahmad,et al.  A new radix-2/8 FFT algorithm for length-q×2m DFTs , 2004, IEEE Trans. Circuits Syst. I Regul. Pap..

[5]  Yoshikazu Miyanaga,et al.  Low power FFT design for wireless communication systems , 2009, 2008 International Symposium on Intelligent Signal Processing and Communications Systems.

[6]  Jesús Grajal,et al.  Efficient Memoryless Cordic for FFT Computation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[7]  John G. Proakis,et al.  Digital Signal Processing: Principles, Algorithms, and Applications , 1992 .

[8]  Lars Wanhammar,et al.  A hardware efficient control of memory addressing for high-performance FFT processors , 2000, IEEE Trans. Signal Process..

[9]  Y. Wang,et al.  Novel Memory Reference Reduction Methods for FFT Implementations on DSP Processors , 2007, IEEE Transactions on Signal Processing.

[10]  Erdal Oruklu,et al.  Reduced memory architecture for CORDIC-based FFT , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[11]  D. Cohen Simplified control of FFT hardware , 1976 .

[12]  T. M. Hopkinson,et al.  A pipelined, high-precision FFT architecture , 1992, [1992] Proceedings of the 35th Midwest Symposium on Circuits and Systems.

[13]  Yutai Ma,et al.  An effective memory addressing scheme for FFT processors , 1999, IEEE Trans. Signal Process..

[14]  Erdal Oruklu,et al.  An Efficient FFT Engine With Reduced Addressing Logic , 2008, IEEE Transactions on Circuits and Systems II: Express Briefs.

[15]  An-Yeu Wu,et al.  Mixed-scaling-rotation CORDIC (MSR-CORDIC) algorithm and architecture for high-performance vector rotational DSP applications , 2005, IEEE Transactions on Circuits and Systems I: Regular Papers.

[16]  Tom E. Bishop,et al.  Blind Image Restoration Using a Block-Stationary Signal Model , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[17]  Han Yue-qiu A Fast Address Generation Scheme for FFT Processor , 2006 .

[18]  Erdal Oruklu,et al.  Fast memory addressing scheme for radix-4 FFT implementation , 2009, 2009 IEEE International Conference on Electro/Information Technology.

[19]  John S. Thompson,et al.  A novel coefficient ordering based low power pipelined radix-4 FFT processor for wireless LAN applications , 2003, IEEE Trans. Consumer Electron..

[20]  Luca Fanucci,et al.  Single-chip mixed-radix FFT processor for real-time on-board SAR processing , 1999, ICECS'99. Proceedings of ICECS '99. 6th IEEE International Conference on Electronics, Circuits and Systems (Cat. No.99EX357).

[21]  Liang Yang,et al.  An efficient locally pipelined FFT processor , 2006, IEEE Trans. Circuits Syst. II Express Briefs.

[22]  Alvin M. Despain,et al.  Fourier Transform Computers Using CORDIC Iterations , 1974, IEEE Transactions on Computers.

[23]  Chin-Long Wey,et al.  Efficient memory-based FFT processors for OFDM applications , 2007, 2007 IEEE International Conference on Electro/Information Technology.

[24]  Weidong Li,et al.  A pipeline FFT processor , 1999, 1999 IEEE Workshop on Signal Processing Systems. SiPS 99. Design and Implementation (Cat. No.99TH8461).

[25]  Richard M. Jiang,et al.  An Area-Efficient FFT Architecture for OFDM Digital Video Broadcasting , 2007, IEEE Transactions on Consumer Electronics.

[26]  L. Wanhammar,et al.  A coefficient access control for low power FFT processors , 1999, 42nd Midwest Symposium on Circuits and Systems (Cat. No.99CH36356).

[27]  Jacob A. Abraham,et al.  A high throughput FFT processor with no multipliers , 2009, 2009 IEEE International Conference on Computer Design.

[28]  Luca Fanucci,et al.  Low-power FFT/IFFT VLSI macro cell for scalable broadband VDSL modem , 2003, The 3rd IEEE International Workshop on System-on-Chip for Real-Time Applications, 2003. Proceedings..