Large-scale FFT on GPU clusters
暂无分享,去创建一个
Yifeng Chen | Hong Mei | Xiang Cui | Hong Mei | Yifeng Chen | Xiang Cui
[1] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[2] Heike Jagode. Fourier Transforms for the BlueGene / L Communication Network , 2006 .
[3] Satoshi Matsuoka,et al. Bandwidth intensive 3-D FFT kernel for GPUs using CUDA , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[4] Satoshi Matsuoka,et al. Auto-tuning 3-D FFT library for CUDA GPUs , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[5] V. Volkov,et al. Fitting FFT onto the G 80 Architecture , 2008 .
[6] Naga K. Govindaraju,et al. High performance discrete Fourier transforms on graphics processors , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[7] Wen-mei W. Hwu,et al. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.
[8] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, HiPC 2008.
[9] Paulius Micikevicius,et al. 3D finite difference computation on GPUs using CUDA , 2009, GPGPU-2.
[10] Ramesh C. Agarwal,et al. A high performance parallel algorithm for 1-D FFT , 1994, Proceedings of Supercomputing '94.
[11] Massimiliano Fatica. Accelerating linpack with CUDA on heterogenous clusters , 2009, GPGPU-2.
[12] Yifeng Chen,et al. Logic of global synchrony , 2001, TOPL.
[13] Naga K. Govindaraju,et al. High performance discrete Fourier transforms on graphics processors , 2008, HiPC 2008.
[14] Yifeng Chen,et al. Improving Performance of Matrix Multiplication and FFT on GPU , 2009, 2009 15th International Conference on Parallel and Distributed Systems.