Bandwidth intensive 3-D FFT kernel for GPUs using CUDA
暂无分享,去创建一个
Toshio Endo | Akira Nukada | Yasuhiko Ogata | Satoshi Matsuoka | Toshio Endo | S. Matsuoka | A. Nukada | Y. Ogata
[1] J. Tukey,et al. An algorithm for the machine calculation of complex Fourier series , 1965 .
[2] J. J. Lambiotte,et al. Computing the Fast Fourier Transform on a vector computer , 1979 .
[3] Paul N. Swarztrauber,et al. FFT algorithms for vector computers , 1984, Parallel Comput..
[4] C. Loan. Computational Frameworks for the Fast Fourier Transform , 1992 .
[5] S. Goedecker. Rotating a three-dimensional array in an optimal position for vector processing: case study for a three-dimensional fast Fourier transform , 1993 .
[6] Ramesh C. Agarwal,et al. An efficient parallel algorithm for the 3-D FFT NAS parallel benchmark , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.
[7] Markus Hegland. Real and Complex Fast Fourier Transforms on the Fujitsu VPP 500 , 1996, Parallel Comput..
[8] David K. McAllister,et al. Fast Matrix Multiplies Using Graphics Hardware , 2001, ACM/IEEE SC 2001 Conference (SC'01).
[9] Mitsuo Yokokawa,et al. 16.4-Tflops Direct Numerical Simulation of Turbulence by a Fourier Spectral Method on the Earth Simulator , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[10] Kenneth Moreland,et al. The FFT on a GPU , 2003, HWWS '03.
[11] Daisuke Takahashi. Efficient implementation of parallel three-dimensional FFT on clusters of PCs , 2003 .
[12] William R. Mark,et al. Cg: a system for programming graphics hardware in a C-like language , 2003, ACM Trans. Graph..
[13] Z. Weng,et al. ZDOCK: An initial‐stage protein‐docking algorithm , 2003, Proteins.
[14] Pat Hanrahan,et al. Brook for GPUs: stream computing on graphics hardware , 2004, SIGGRAPH 2004.
[15] Steven G. Johnson,et al. The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.
[16] N.K. Govindaraju,et al. A Memory Model for Scientific Algorithms on Graphics Processors , 2006, ACM/IEEE SC 2006 Conference (SC'06).
[17] David Tarditi,et al. Accelerator: using data parallelism to program GPUs for general-purpose uses , 2006, ASPLOS XII.
[18] Mark J. Stock,et al. Toward efficient GPU-accelerated N-body simulations , 2008 .
[19] Satoshi Matsuoka,et al. An efficient, model-based CPU-GPU heterogeneous FFT library , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[20] Emilio L. Zapata,et al. Memory Locality Exploitation Strategies for FFT on the CUDA Architecture , 2008, VECPAR.