论文信息 - Message passing for GPGPU clusters: CudaMPI

Message passing for GPGPU clusters: CudaMPI

We present and analyze two new communication libraries, cudaMPI and glMPI, that provide an MPI-like message passing interface to communicate data stored on the graphics cards of a distributed-memory parallel computer. These libraries can help applications that perform general purpose computations on these networked GPU clusters. We explore how to efficiently support both point-to-point and collective communication for either contiguous or noncontiguous data on modern graphics cards. Our software design is informed by a detailed analysis of the actual performance of modern graphics hardware, for which we develop and test a simple but useful performance model.

Orion S. Lawlor | O. Lawlor

[1] Ivan E. Sutherland,et al. On the design of display processors , 1968, Commun. ACM.

[2] Laxmikant V. Kalé,et al. Adaptive MPI , 2003, LCPC.

[3] David A. Patterson,et al. Latency lags bandwith , 2004, CACM.

[4] George Bosilca,et al. Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.

[5] John D. Owens,et al. Distributed texture memory in a multi-GPU environment , 2006, GH '06.

[6] J. V. Dongen. The Game Asset Pipeline , 2007 .

[7] John D. Owens,et al. GPU Computing , 2008, Proceedings of the IEEE.

[8] Samuel Williams,et al. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[9] Orion Sky Lawlor,et al. MPIglut: Powerwall Programming Made Easier , 2008, J. WSCG.

[10] John D. Owens,et al. Message passing on data-parallel architectures , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[11] Paulius Micikevicius,et al. 3D finite difference computation on GPUs using CUDA , 2009, GPGPU-2.