A New Approach to MPI Collective Communication Implementations

Recent research into the optimization of collective MPI operations has resulted in a wide variety of algorithms and corresponding implementations, each typically only applicable in a relatively narrow scope: on a specific architecture, on a specific network, with a specific number of processes, with a specific data size and/or data-type – or any combination of these (or other) factors. This situation presents an enormous challenge to portable MPI implementations which are expected to provide optimized collective operation performance on all platforms. Many portable implementations have attempted to provide a token number of algorithms that are intended to realize good performance on most systems. However, many platform configurations are still left without well-tuned collective operations. This paper presents a proposal for a framework that will allow a wide variety of collective algorithm implementations and a flexible, multi-tiered selection process for choosing which implementation to use when an application invokes an MPI collective function.

[1]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[2]  George Bosilca,et al.  Analysis of the Component Architecture Overhead in Open MPI , 2005, PVM/MPI.

[3]  Chris J. Scheiman,et al.  LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation , 1995, SPAA '95.

[4]  R. Rabenseifner,et al.  Automatic MPI Counter Profiling of All Users: First Results on a CRAY T3E 900-512 , 2004 .

[5]  Jack J. Dongarra,et al.  Performance analysis of MPI collective operations , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[6]  Robert A. van de Geijn,et al.  On optimizing collective communication , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).

[7]  Greg Burns,et al.  LAM: An Open Cluster Environment for MPI , 2002 .

[8]  Lars Paul Huse Collective Communication on Dedicated Clusters of Workstations , 1999, PVM/MPI.

[9]  Jeffrey M. Squyres,et al.  The Component Architecture of Open MPI: Enabling Third-Party Collective Algorithms* , 2005 .

[10]  Jack Dongarra,et al.  Extending the MPI Specification for Process Fault Tolerance on High Performance Computing Systems , 2004 .

[11]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[12]  Robert A. van de Geijn,et al.  Fast Collective Communication Libraries, Please , 1995 .

[13]  Torsten Hoefler,et al.  A practical approach to the rating of barrier algorithms using the LogP model and Open MPI , 2005, 2005 International Conference on Parallel Processing Workshops (ICPPW'05).

[14]  George Bosilca,et al.  Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.

[15]  Sathish S. Vadhiyar,et al.  Automatically Tuned Collective Communications , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[16]  Thomas Rauber,et al.  A decomposition approach for optimizing the performance of MPI libraries , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.