MPI Sessions: Evaluation of an Implementation in Open MPI

The recently proposed MPI Sessions extensions to the MPI standard present a new paradigm for applications to use with MPI. MPI Sessions has the potential to address several limitations of MPI’s current specification: MPI cannot be initialized within an MPI process from different application components without a priori knowledge or coordination; MPI cannot be initialized more than once; and, MPI cannot be reinitialized after MPI finalization. MPI Sessions also offers the possibility for more flexible ways for individual components of an application to express the capabilities they require from MPI at a finer granularity than is presently possible.At this time, MPI Sessions has reached sufficient maturity for implementation and evaluation, which are the focuses of this paper. This paper presents a prototype implementation of MPI Sessions, discusses certain of its performance characteristics, and describes its successful use in a large-scale production MPI application. Overall, MPI Sessions is shown to be implementable, integrable with key infrastructure, and effective, but with certain overheads involving the initialization of MPI as well as communicator construction. Small impacts on message-passing latency and throughput are noted. Open MPI was used as the implementation vehicle, but results here are also relevant to other middleware stacks.

[1]  Andrew S. Grimshaw,et al.  Legion: Lessons Learned Building a Grid Operating System , 2005, Proceedings of the IEEE.

[2]  Nathan T. Hjelm,et al.  Improving MPI Multi-threaded RMA Communication Performance , 2018, ICPP.

[3]  William J. Dally,et al.  Technology-Driven, Highly-Scalable Dragonfly Topology , 2008, 2008 International Symposium on Computer Architecture.

[4]  Alan Wagner,et al.  Scalability of communicators and groups in MPI , 2010, HPDC '10.

[5]  Jack J. Dongarra,et al.  The PVM Concurrent Computing System: Evolution, Experiences, and Trends , 1994, Parallel Comput..

[6]  Martin Schulz,et al.  Evaluating and extending user-level fault tolerance in MPI applications , 2016, Int. J. High Perform. Comput. Appl..

[7]  Patrick S. McCormick,et al.  Accommodating Thread-Level Heterogeneity in Coupled Parallel Applications , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[8]  D. Roweth,et al.  Cray XC ® Series Network , 2012 .

[9]  Samuel Keith Gutierrez Adaptive Parallelism for Coupled, Multithreaded Message-Passing Programs , 2018 .

[10]  Scott Klasky,et al.  Exploring Automatic, Online Failure Recovery for Scientific Applications at Extreme Scales , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[11]  Anthony Skjellum,et al.  MPI Stages: Checkpointing MPI State for Bulk Synchronous Applications , 2018, EuroMPI.

[12]  G. Pardo-Castellote,et al.  OMG data distribution service: architectural overview , 2003, IEEE Military Communications Conference, 2003. MILCOM 2003..

[13]  Aurelien Bouteiller,et al.  PMIx: process management for exascale environments , 2017, EuroMPI/USA.

[14]  George Bosilca,et al.  Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.

[15]  Martin Schulz,et al.  MPI Sessions: Leveraging Runtime Infrastructure to Increase Scalability of Applications at Exascale , 2016, EuroMPI.

[16]  Edgar Gabriel,et al.  Evaluating Sparse Data Storage Techniques for MPI Groups and Communicators , 2008, ICCS.