An efficient parallelization technique for high throughput FFT-ASIPs

Fast Fourier transformation (FFT) and its inverse (IFFT) are used in orthogonal frequency division multiplexing (OFDM) systems for data (de)modulation. The transformations are the kernel tasks in an OFDM implementation, and are the most processing-intensive ones. Recent trends in the electronic consumer market require OFDM implementations to be flexible, making a trade-off between area, energy-efficiency, flexibility and timing a necessity. This has spurred the development of application-specific instruction-set processors (ASIPs) for FFT processing. Parallelization is an architectural parameter that significantly influence design goals. This paper presents an analysis of the efficiency of parallelization techniques for an FFT-ASIP. It is shown that existing techniques are inefficient for high throughput applications such as ultra wideband (UWB), because of memory bottlenecks. Therefore, an interleaved execution technique which exploits temporal parallelism is proposed. With this technique, it is possible to meet the throughput requirement of UWB (409.6 Msamples/s) with only 4 non-trivial butterfly units for an ASIP that runs at 400MHz

[1]  Gerd Ascheid,et al.  FFT processor: a case study in ASIP development , 2005 .

[2]  Jarmo Takala,et al.  Conflict-free parallel memory access scheme for FFT processors , 2003, Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS '03..

[3]  Lajos Hanzo,et al.  OFDM and MC-CDMA for Broadband Multi-User Communications, WLANs and Broadcasting , 2003 .

[4]  Chen-Yi Lee,et al.  A 1-GS/s FFT/IFFT processor for UWB applications , 2005, IEEE J. Solid State Circuits.

[5]  Gerd Ascheid,et al.  Design of Application Specific Processors for the Cached FFT Algorithm , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[6]  Henk Corporaal Microprocessor architectures - from VLIW to TTA , 1997 .

[7]  Bevan M. Baas,et al.  A low-power, high-performance, 1024-point FFT processor , 1999, IEEE J. Solid State Circuits.

[8]  Allan O. Steinhardt,et al.  Fast algorithms for digital signal processing , 1986, Proceedings of the IEEE.

[9]  H. Meyr,et al.  A framework for automated and optimized ASIP implementation supporting multiple hardware description languages , 2005, Proceedings of the ASP-DAC 2005. Asia and South Pacific Design Automation Conference, 2005..

[10]  Allan Hartstein,et al.  Optimum Power/Performance Pipeline Depth , 2003, MICRO.

[11]  Joseph Mitola,et al.  Software Radio Architecture: Object-Oriented Approaches to Wireless Systems Engineering , 2000 .

[12]  Myung Hoon Sunwoo,et al.  Application-specific DSP architecture for fast Fourier transform , 2003, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003.