Message passing for GPGPU clusters: CudaMPI

We present and analyze two new communication libraries, cudaMPI and glMPI, that provide an MPI-like message passing interface to communicate data stored on the graphics cards of a distributed-memory parallel computer. These libraries can help applications that perform general purpose computations on these networked GPU clusters. We explore how to efficiently support both point-to-point and collective communication for either contiguous or noncontiguous data on modern graphics cards. Our software design is informed by a detailed analysis of the actual performance of modern graphics hardware, for which we develop and test a simple but useful performance model.

[1]  Ivan E. Sutherland,et al.  On the design of display processors , 1968, Commun. ACM.

[2]  Laxmikant V. Kalé,et al.  Adaptive MPI , 2003, LCPC.

[3]  David A. Patterson,et al.  Latency lags bandwith , 2004, CACM.

[4]  George Bosilca,et al.  Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.

[5]  John D. Owens,et al.  Distributed texture memory in a multi-GPU environment , 2006, GH '06.

[6]  J. V. Dongen The Game Asset Pipeline , 2007 .

[7]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[8]  Samuel Williams,et al.  Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[9]  Orion Sky Lawlor,et al.  MPIglut: Powerwall Programming Made Easier , 2008, J. WSCG.

[10]  John D. Owens,et al.  Message passing on data-parallel architectures , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[11]  Paulius Micikevicius,et al.  3D finite difference computation on GPUs using CUDA , 2009, GPGPU-2.