Design of an application-specific instruction set processor for high-throughput and scalable FFT

Various orthogonal frequency division multiplexing (OFDM)-based wireless communication standards have raised more stringent requirements on throughput and flexibility of fast Fourier transformation (FFT), a kernel data transformation task in communication systems. Application-specific instruction set processor (ASIP) has emerged as a promising solution to meet these requirements. In this paper, we propose a novel ASIP design tailored for FFT computation. We reconstruct the FFT computation flow into a scalable array structure based on an 8-point butterfly unit (BU). Any-point FFT computation can be carried out in the array structure which can easily expand along both the horizontal and vertical dimensions. We incorporate custom register files to reduce memory access. The data address for custom registers in each FFT stage is changed accordingly, and we derive a regular address changing (AC) rule. With the microarchitecture modifications, we extend the instruction set with three custom instructions correspondingly. Our FFT ASIP implementation achieves great performance improvement over the standard FFT software implementation, one TI DSP processor, and one commercial Xtensa ASIP, with the data throughput improvement as 866.5X, 5.9X, 2.3X, respectively. Meanwhile, the area and power consumption overhead of the custom hardware is negligible.

[1]  Myung Hoon Sunwoo,et al.  Application-specific DSP architecture for fast Fourier transform , 2003, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003.

[2]  Carlin James Vieri Pendulum--a reversible computer architecture , 1995 .

[3]  J. Tukey,et al.  An algorithm for the machine calculation of complex Fourier series , 1965 .

[4]  Gerd Ascheid,et al.  Design of Application Specific Processors for the Cached FFT Algorithm , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[5]  Instruments Texas TMS320C6713 Floating-Point Digital Signal Processor , 2002 .

[6]  Gerd Ascheid,et al.  An efficient parallelization technique for high throughput FFT-ASIPs , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[7]  Stamatis Vassiliadis,et al.  A Low-Power Multithreaded Processor for Software Defined Radio , 2006, J. VLSI Signal Process..

[8]  D. J. Skellern,et al.  VLSI for OFDM , 1998 .

[9]  Gerd Ascheid,et al.  FFT processor: a case study in ASIP development , 2005 .

[10]  Myung Hoon Sunwoo,et al.  A high-speed FFT processor for OFDM systems , 2002, 2002 IEEE International Symposium on Circuits and Systems. Proceedings (Cat. No.02CH37353).

[11]  Lutz H.-J. Lampe,et al.  Performance analysis of multiband OFDM for UWB communication , 2005, IEEE International Conference on Communications, 2005. ICC 2005. 2005.

[12]  René van Leuken,et al.  A multistandard FFT processor for wireless system-on-chip implementations , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[13]  Darin Petkov,et al.  Automatic generation of application specific processors , 2003, CASES '03.

[14]  Hyunseok Lee,et al.  SODA: A Low-power Architecture For Software Radio , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[15]  Bevan M. Baas,et al.  A low-power, high-performance, 1024-point FFT processor , 1999, IEEE J. Solid State Circuits.