RMA-MT: A Benchmark Suite for Assessing MPI Multi-threaded RMA Performance

Reaching Exascale will require leveraging massive parallelism while potentially leveraging asynchronous communication to help achieve scalability at such large levels of concurrency. MPI is a good candidate for providing the mechanisms to support communication at such large scales. Two existing MPI mechanisms are particularly relevant to Exascale: multi-threading, to support massive concurrency, and Remote Memory Access (RMA), to support asynchronous communication. Unfortunately, multi-threaded MPI RMA code has not been extensively studied. Part of the reason for this is that no public benchmarks or proxy applications exist to assess its performance. The contributions of this paper are the design and demonstration of the first available proxy applications and micro-benchmark suite for multi-threaded RMA in MPI, a study of multi-threaded RMA performance of different MPI implementations, and an evaluation of how these benchmarks can be used to test development for both performance and correctness.

[1]  Sandia Report,et al.  Improving Performance via Mini-applications , 2009 .

[2]  Ryan E. Grant,et al.  NiMC: Characterizing and Eliminating Network-Induced Memory Contention , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[3]  Barbara M. Chapman,et al.  Implementing OpenSHMEM Using MPI-3 One-Sided Communication , 2014, OpenSHMEM.

[4]  Rajeev Thakur,et al.  Test Suite for Evaluating Performance of MPI Implementations That Support MPI_THREAD_MULTIPLE , 2007, PVM/MPI.

[5]  Simon D. Hammond,et al.  An evaluation of MPI message rate on hybrid-core processors , 2014, Int. J. High Perform. Comput. Appl..

[6]  Rajeev Thakur,et al.  An implementation and evaluation of the MPI 3.0 one‐sided communication interface , 2016, Concurr. Comput. Pract. Exp..

[7]  James Dinan,et al.  Enabling Efficient Multithreaded MPI Communication through a Library-Based Implementation of MPI Endpoints , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[8]  Rajeev Thakur,et al.  Enabling communication concurrency through flexible MPI endpoints , 2014, Int. J. High Perform. Comput. Appl..

[9]  Katherine A. Yelick,et al.  An Evaluation of One-Sided and Two-Sided Communication Paradigms on Relaxed-Ordering Interconnect , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[10]  Duncan Roweth,et al.  Thread-Safe SHMEM Extensions , 2014, OpenSHMEM.

[11]  Simon D. Hammond,et al.  The impact of hybrid-core processors on MPI message rate , 2013, EuroMPI.

[12]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[13]  Alex Rapaport,et al.  Mpi-2: extensions to the message-passing interface , 1997 .