Generating FPGA-Accelerated DFT Libraries

We present a domain-specific approach to generate high-performance hardware-software partitioned implementations of the discrete Fourier transform (DFT) in fixed point precision. The partitioning strategy is a heuristic based on the DFT's divide-and-conquer algorithmic structure and fine tuned by the feedback-driven exploration of candidate designs. We have integrated this approach in the Spiral linear-transform code-generation framework to support push-button automatic implementation. We present evaluations of hardware-software DFT implementations running on the embedded PowerPC processor and the reconfigurable fabric of the Xilinx Virtex-II Pro FPGA. In our experiments, the 1D and 2D DFT's FPGA-accelerated libraries exhibit between 2 and 7.5 times higher performance (operations per second) and up to 2.5 times better energy efficiency (operations per Joule) than the software-only version.

[1]  Fadi J. Kurdahi,et al.  A Scalable Embedded JPEG2000 Architecture , 2005, SAMOS.

[2]  Wolfgang Rosenstiel,et al.  Exploiting FPGA-features during the emulation of a fast reactive embedded system , 1999, FPGA '99.

[3]  Franz Franchetti,et al.  Performance/Energy Optimization of DSP Transforms on the XScale Processor , 2007, HiPEAC.

[4]  Frank Vahid,et al.  SpecSyn: an environment supporting the specify-explore-refine paradigm for hardware/software system design , 1998, IEEE Trans. Very Large Scale Integr. Syst..

[5]  Edward A. Lee,et al.  A global criticality/local phase driven algorithm for the constrained hardware/software partitioning problem , 1994, CODES.

[6]  Bastian Knerr,et al.  IMPROVEMENTS OF THE GCLP ALGORITHM FOR HW / SW PARTITIONING OF TASK GRAPHS , 2006 .

[7]  Mit Press A Fixed-Point Fast Fourier Transform Error Analysis , 1969 .

[8]  Franz Franchetti,et al.  Formal loop merging for signal transforms , 2005, PLDI '05.

[9]  John Wawrzynek,et al.  Garp: a MIPS processor with a reconfigurable coprocessor , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).

[10]  Nachum Dershowitz,et al.  In handbook of automated reasoning , 2001 .

[11]  James C. Hoe,et al.  Automatic generation of customized discrete Fourier transform IPs , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[12]  Zoltán Ádám Mann,et al.  Algorithmic aspects of hardware/software partitioning , 2005, TODE.

[13]  James C. Hoe,et al.  Fast and accurate resource estimation of automatically generated custom DFT IP cores , 2006, FPGA '06.

[14]  Pedro Sánchez-Palma,et al.  Image Processing Application Development: From Rapid Prototyping to SW/HW Co-simulation and Automated Code Generation , 2005, IbPRIA.

[15]  W. Knight,et al.  A simple fixed-point error bound for the fast Fourier transform , 1979 .

[16]  Franz Franchetti,et al.  Discrete Fourier Transform Compiler : From Mathematical Representation to Efficient Hardware , 2007 .

[17]  Franz Franchetti,et al.  SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.