Auto-tuning 3-D FFT library for CUDA GPUs
暂无分享,去创建一个
[1] Yuefan Deng,et al. New trends in high performance computing , 2001, Parallel Computing.
[2] N.K. Govindaraju,et al. A Memory Model for Scientific Algorithms on Graphics Processors , 2006, ACM/IEEE SC 2006 Conference (SC'06).
[3] Steven G. Johnson,et al. The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.
[4] C. Loan. Computational Frameworks for the Fast Fourier Transform , 1992 .
[5] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[6] Jack J. Dongarra,et al. Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..
[7] Mark J. Stock,et al. Toward efficient GPU-accelerated N-body simulations , 2008 .
[8] Kenneth Moreland,et al. The FFT on a GPU , 2003, HWWS '03.
[9] Naga K. Govindaraju,et al. High performance discrete Fourier transforms on graphics processors , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[10] Franz Franchetti,et al. SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.
[11] Dawid Pajak. General-Purpose Computation Using Graphics Hardware for Fast HDR Image Processing , 2007 .
[12] Samuel Williams,et al. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[13] Satoshi Matsuoka,et al. Bandwidth intensive 3-D FFT kernel for GPUs using CUDA , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[14] V. Volkov,et al. Fitting FFT onto the G 80 Architecture , 2008 .
[15] David K. McAllister,et al. Fast Matrix Multiplies Using Graphics Hardware , 2001, ACM/IEEE SC 2001 Conference (SC'01).
[16] J. Tukey,et al. An algorithm for the machine calculation of complex Fourier series , 1965 .
[17] Satoshi Matsuoka. The Rise of the Commodity Vectors , 2008, VECPAR.
[18] J. Xu. OpenCL – The Open Standard for Parallel Programming of Heterogeneous Systems , 2009 .