Accelerating the Nonuniform Fast Fourier Transform Using FPGAs

We present an FPGA accelerator for the Non-uniform Fast Fourier Transform, which is a technique to reconstruct images from arbitrarily sampled data. We accelerate the compute-intensive interpolation step of the NuFFT Gridding algorithm by implementing it on an FPGA. In order to ensure efficient memory performance, we present a novel FPGA implementation for Geometric Tiling based sorting of the arbitrary samples. The convolution is then performed by a novel Data Translation architecture which is composed of a multi-port local memory, dynamic coordinate-generator and a plug-and-play kernel pipeline. Our implementation is in single-precision floating point and has been ported onto the BEE3 platform. Experimental results show that our FPGA implementation can generate fairly high performance without sacrificing flexibility for various data-sizes and kernel functions. We demonstrate up to 8X speedup and up to 27 times higher performance-per-watt over a comparable CPU implementation and up to 20% higher performance-per-watt when compared to a relevant GPU implementation.

[1]  Jan Timmer,et al.  The gridding method for image reconstruction by Fourier transformation , 1995, IEEE Trans. Medical Imaging.

[2]  G. Beylkin On the Fast Fourier Transform of Functions with Singularities , 1995 .

[3]  White Paper FPGA Coprocessing Evolution : Sustained Performance Approaches Peak Performance , 1998 .

[4]  Gabriele Steidl,et al.  Fast Fourier Transforms for Nonequispaced Data: A Tutorial , 2001 .

[5]  Jeffrey A. Fessler,et al.  Nonuniform fast Fourier transforms using min-max interpolation , 2003, IEEE Trans. Signal Process..

[6]  Karl S. Hemmert,et al.  Closing the gap: CPU and FPGA trends in sustainable floating-point BLAS performance , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[7]  Marco Lanuzza,et al.  A high-performance fully reconfigurable FPGA-based 2D convolution processor , 2005, Microprocess. Microsystems.

[8]  Mahmut T. Kandemir,et al.  Geometric Tiling for Reducing Power Consumption in Structured Matrix Operations , 2006, 2006 IEEE International SOC Conference.

[9]  Rüdiger Westermann,et al.  MR image reconstruction using the GPU , 2006, SPIE Medical Imaging.

[10]  Stefan Kunis,et al.  NFFT 3 . 0-Tutorial , 2006 .

[11]  Zhiguo Cao,et al.  Implementation of large kernel 2-D convolution in limited FPGA resource , 2007, International Symposium on Multispectral Image Processing and Pattern Recognition.

[12]  Xiaobai Sun,et al.  Accelerating nonuniform fast Fourier transform via reduction in memory access latency , 2008, Optical Engineering + Applications.

[13]  Justin P. Haldar,et al.  Accelerating advanced mri reconstructions on gpus , 2008, CF '08.

[14]  Anthony Gregerson Implementing Fast MRI Gridding on GPUs via CUDA , 2008 .

[15]  Tobias Schaeffter,et al.  Accelerating the Nonequispaced Fast Fourier Transform on Commodity Graphics Hardware , 2008, IEEE Transactions on Medical Imaging.

[16]  Mahmut T. Kandemir,et al.  Exploring parallelization strategies for NUFFT data translation , 2009, EMSOFT '09.

[17]  C. Chakrabarti,et al.  Automated optimization of look-up table implementation for function evaluation on FPGAs , 2009, Optical Engineering + Applications.

[18]  Chen Chang,et al.  BEE3: Revitalizing Computer Architecture Research , 2009 .

[19]  N. Vijaykrishnan,et al.  A scalable multi-FPGA framework for real-time digital signal processing , 2009, Optical Engineering + Applications.

[20]  A.,et al.  FAST FOURIER TRANSFORMS FOR NONEQUISPACED DATA * , 2022 .