Scalable multi-GPU 3-D FFT for TSUBAME 2.0 Supercomputer
暂无分享,去创建一个
[1] Yifeng Chen,et al. Large-scale FFT on GPU clusters , 2010, ICS '10.
[2] 平田 文男. Molecular theory of solvation , 2003 .
[3] C. Loan. Computational Frameworks for the Fast Fourier Transform , 1992 .
[4] Dawid Pajak. General-Purpose Computation Using Graphics Hardware for Fast HDR Image Processing , 2007 .
[5] Satoshi Matsuoka,et al. Bandwidth intensive 3-D FFT kernel for GPUs using CUDA , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[6] Yasuomi Kiyota,et al. A New Approach for Investigating the Molecular Recognition of Protein: Toward Structure-Based Drug Design Based on the 3D-RISM Theory. , 2011, Journal of chemical theory and computation.
[7] Erik Lindholm,et al. NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.
[8] Stephen R. Comeau,et al. PIPER: An FFT‐based protein docking program with pairwise potentials , 2006, Proteins.
[9] Naga K. Govindaraju,et al. High performance discrete Fourier transforms on graphics processors , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[10] Christopher E. Cramer,et al. The Development and Integration of a Distributed 3D FFT for a Cluster of Workstations , 2000, Annual Linux Showcase & Conference.
[11] Naga K. Govindaraju,et al. Auto-tuning of fast fourier transform on graphics processors , 2011, PPoPP '11.
[12] Fumio Hirata,et al. Ligand mapping on protein surfaces by the 3D-RISM theory: toward computational fragment-based drug design. , 2009, Journal of the American Chemical Society.
[13] V. Volkov,et al. Fitting FFT onto the G 80 Architecture , 2008 .
[14] Katherine A. Yelick,et al. Optimizing bandwidth limited problems using one-sided communication and overlap , 2005, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[15] Z. Weng,et al. ZDOCK: An initial‐stage protein‐docking algorithm , 2003, Proteins.
[16] Robert S. Germain,et al. Scalable framework for 3D FFTs on the Blue Gene/L supercomputer: Implementation and early performance measurements , 2005, IBM J. Res. Dev..
[17] J. Tukey,et al. An algorithm for the machine calculation of complex Fourier series , 1965 .
[18] Daisuke Takahashi. An Implementation of Parallel 3-D FFT with 2-D Decomposition on a Massively Parallel Cluster of Multi-core Processors , 2009, PPAM.
[19] Christophe Calvin,et al. Implementation of Parallel FFT Algorithms on Distributed Memory Machines with a Minimum Overhed of Communication , 1996, Parallel Comput..
[20] Kenneth Moreland,et al. The FFT on a GPU , 2003, HWWS '03.
[21] Satoshi Matsuoka,et al. High performance 3-D FFT using multiple CUDA GPUs , 2012, GPGPU-5.
[22] Mitsuo Yokokawa,et al. 16.4-Tflops Direct Numerical Simulation of Turbulence by a Fourier Spectral Method on the Earth Simulator , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[23] Richard Vuduc,et al. Prospects for scalable 3D FFTs on heterogeneous exascale systems , 2011 .
[24] Satoshi Matsuoka,et al. Auto-tuning 3-D FFT library for CUDA GPUs , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[25] Yasushi Negishi,et al. Overlapping Methods of All-to-All Communication and FFT Algorithms for Torus-Connected Massively Parallel Supercomputers , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.