High performance 3-D FFT using multiple CUDA GPUs

Fast Fourier transform is one of the most important computations used in many kinds of applications. Although there are several works of on single GPU FFT, we also need large-scale transforms that require multiple GPUs due to the capacity of the device memory. We present high performance 3-D FFT using multiple GPU devices both on a single node and on multiple nodes. As a result of optimizing the data transfer between GPUs, our multi GPU FFT successfully outperform single GPU.

[1]  J. Tukey,et al.  An algorithm for the machine calculation of complex Fourier series , 1965 .

[2]  C. Loan Computational Frameworks for the Fast Fourier Transform , 1992 .

[3]  Kenneth Moreland,et al.  The FFT on a GPU , 2003, HWWS '03.

[4]  平田 文男 Molecular theory of solvation , 2003 .

[5]  Dawid Pajak General-Purpose Computation Using Graphics Hardware for Fast HDR Image Processing , 2007 .

[6]  Erik Lindholm,et al.  NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[7]  Burton J. Smith,et al.  High performance discrete Fourier transforms on graphics processors , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[8]  Bandwidth intensive 3-D FFT kernel for GPUs using CUDA , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[9]  V. Volkov,et al.  Fitting FFT onto the G 80 Architecture , 2008 .

[10]  Satoshi Matsuoka,et al.  Auto-tuning 3-D FFT library for CUDA GPUs , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[11]  Yasuomi Kiyota,et al.  A New Approach for Investigating the Molecular Recognition of Protein: Toward Structure-Based Drug Design Based on the 3D-RISM Theory. , 2011, Journal of chemical theory and computation.

[12]  Naga K. Govindaraju,et al.  Auto-tuning of fast fourier transform on graphics processors , 2011, PPoPP '11.