Improving Performance of Matrix Multiplication and FFT on GPU
暂无分享,去创建一个
Yifeng Chen | Hong Mei | Xiang Cui | Hong Mei | Yifeng Chen | Xiang Cui
[1] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[2] Steven G. Johnson,et al. The Fastest Fourier Transform in the West , 1997 .
[3] Wen-mei W. Hwu,et al. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.
[4] Naga K. Govindaraju,et al. High performance discrete Fourier transforms on graphics processors , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[5] Emilio L. Zapata,et al. Memory Locality Exploitation Strategies for FFT on the CUDA Architecture , 2008, VECPAR.
[6] V. Volkov,et al. Fitting FFT onto the G 80 Architecture , 2008 .