Assessment of Graphic Processing Units (GPUs) for Department of Defense (DoD) Digi

Author(s): Owens, John D.; Sengupta, Shubhabrata; Horn, Daniel | Abstract: In this report we analyze the performance of the fast Fourier transform (FFT) on graphics hardware (the GPU), comparing it to the best-of-class CPU implementation FFTW. We describe the FFT, the architecture of the GPU, and how general-purpose computation is structured on the GPU. We then identify the factors that influence FFT performance and describe several experiments that compare these factors between the CPU and the GPU. We conclude that the overhead of transferring data and initiating GPU computation are substantially higher than on the CPU, and thus for latency-critical applications, the CPU is a superior choice. We show that the CPU implementation is limited by computation and the GPU implementation by GPU memory bandwidth and its lack of a writable cache. The GPU is comparatively better suited for larger FFTs with many FFTs computed in parallel in applications where FFT throughput is most important; on these applications GPU and CPU performance is roughly on par. We also demonstrate that adding additional computation to an application that includes the FFT, particularly computation that is GPU-friendly, puts the GPU at an advantage compared to the CPU.

[1]  J. Tukey,et al.  An algorithm for the machine calculation of complex Fourier series , 1965 .

[2]  Steven G. Johnson,et al.  FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[3]  Kenneth Moreland,et al.  The FFT on a GPU , 2003, HWWS '03.

[4]  W. Kester Fast Fourier Transforms , 2003 .

[5]  Erwin Keeve,et al.  Fourier Volume Rendering on the GPU Using a Split-Stream-FFT , 2004, VMV.

[6]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, ACM Trans. Graph..

[7]  Jim Nilsson,et al.  An in-depth look at computer performance growth , 2005, CARN.

[8]  Jens H. Krüger,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.

[9]  John Owens,et al.  Streaming architectures and technology trends , 2005, SIGGRAPH Courses.

[10]  Randima Fernando,et al.  The GeForce 6 series GPU architecture , 2005, SIGGRAPH Courses.

[11]  Pat Hanrahan,et al.  ClawHMMER: A Streaming HMMer-Search Implementation , 2005, SC.

[12]  Mark J. Harris Mapping computational concepts to GPUs , 2005, SIGGRAPH Courses.

[13]  Naga K. Govindaraju,et al.  GPGPU: general-purpose computation on graphics hardware , 2006, SC.