A Case for Standard Non-blocking Collective Operations

In this paper we make the case for adding standard nonblocking collective operations to the MPI standard. The nonblocking point-to-point and blocking collective operations currently defined by MPI provide important performance and abstraction benefits. To allow these benefits to be simultaneously realized, we present an application programming interface for non-blocking collective operations in MPI. Microbenchmark and application-based performance results demonstrate that non-blocking collective operations offer not only improved convenience, but improved performance as well, when compared to manual use of threads with blocking collectives.

[1]  Rajeev Thakur,et al.  Issues in Developing a Thread-Safe MPI Implementation , 2006, PVM/MPI.

[2]  Keith D. Underwood,et al.  Analyzing the Impact of Overlap, Offload, and Independent Progress for Message Passing Interface Applications , 2005, Int. J. High Perform. Comput. Appl..

[3]  Torsten Hoefler,et al.  Optimizing a conjugate gradient solver with non-blocking collective operations , 2006, Parallel Comput..

[4]  D. Martin Swany,et al.  Transformations to Parallel Codes for Communication-Computation Overlap , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[5]  Torsten Hoefler,et al.  Implementation and performance analysis of non-blocking collective operations for MPI , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[6]  Torsten Hoefler,et al.  Non-Blocking Collective Operations for MPI-2 , 2006 .

[7]  Stefan Goedecker,et al.  An efficient 3-dim FFT for plane wave electronic structure calculations on massively parallel machines composed of multiprocessor nodes , 2003 .

[8]  George Bosilca,et al.  Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.

[9]  Anshu Dubey,et al.  Redistribution strategies for portable parallel FFT: a case study , 2001, Concurr. Comput. Pract. Exp..

[10]  Anthony Skjellum,et al.  MPI/RT-an emerging standard for high-performance real-time systems , 1998, Proceedings of the Thirty-First Hawaii International Conference on System Sciences.

[11]  Christophe Calvin,et al.  Minimizing Communication Overhead Using Pipelining for Multi-Dimensional FFT on Distributed Memory Machines , 1993, PARCO.

[12]  Jack Dongarra,et al.  Recent Advances in Parallel Virtual Machine and Message Passing Interface, 15th European PVM/MPI Users' Group Meeting, Dublin, Ireland, September 7-10, 2008. Proceedings , 2008, PVM/MPI.

[13]  Sathish S. Vadhiyar,et al.  Automatically Tuned Collective Communications , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[14]  Jack Dongarra,et al.  High Performance Computing for Computational Science - VECPAR 2004, 6th International Conference, Valencia, Spain, June 28-30, 2004, Revised Selected and Invited Papers , 2005, VECPAR.

[15]  Torsten Hoefler,et al.  Design, Implementation, and Usage of LibNBC , 2006 .

[16]  Ron Brightwell,et al.  The Portals 3.0 Message Passing Interface Revision 1.0 , 1999 .

[17]  Laxmikant V. Kalé,et al.  A framework for collective personalized communication , 2003, Proceedings International Parallel and Distributed Processing Symposium.