Measuring Multithreaded Message Matching Misery

MPI usage patterns are changing as applications move towards fully-multithreaded runtimes. However, the impact of these patterns on MPI message matching is not well-studied. In particular, MPI’s mechanic for receiver-side data placement, message matching, can be impacted by increased message volume and nondeterminism incurred by multithreading. While there has been significant developer interest and work to provide an efficient MPI interface for multithreaded access, there has not been a study showing how these patterns affect messaging patterns and matching behavior. In this paper, we present a framework for studying the effects of multithreading on MPI message matching. This framework allows us to explore the implications of different common communication patterns and thread-level decompositions. We present a study of these impacts on the architecture of two of the Top 10 supercomputers (NERSC’s Cori and LANL’s Trinity). This data provides a baseline to evaluate reasonable matching engine queue lengths, search depths, and queue drain times under the multithreaded model. Furthermore, the study highlights surprising results on the challenge posed by message matching for multithreaded application performance.

[1]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[2]  Peter V. Coveney,et al.  Large-scale molecular dynamics simulation of DNA: implementation and validation of the AMBER98 force field in LAMMPS , 2004, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[3]  Keith D. Underwood,et al.  The impact of MPI queue usage on message latency , 2004, International Conference on Parallel Processing, 2004. ICPP 2004..

[4]  Keith D. Underwood,et al.  Enhancing NIC performance for MPI using processing-in-memory , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[5]  Karl S. Hemmert,et al.  A hardware acceleration unit for MPI queue processing , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[6]  Rajeev Thakur,et al.  Fine-Grained Multithreading Support for Hybrid Threaded MPI Programming , 2010, Int. J. High Perform. Comput. Appl..

[7]  C. Svaneborg Large-scale Atomic/Molecular Massively Parallel Simulator , 2011 .

[8]  Simon D. Hammond,et al.  An evaluation of MPI message rate on hybrid-core processors , 2014, Int. J. High Perform. Comput. Appl..

[9]  Ahmad Afsahi,et al.  A fast and resource-conscious MPI message queue mechanism for large-scale jobs , 2014, Future Gener. Comput. Syst..

[10]  Brian W. Barrett,et al.  The Portals 4.3 Network Programming Interface , 2014 .

[11]  Stephen L. Olivier,et al.  Early Experiences Co-Scheduling Work and Communication Tasks for Hybrid MPI+X Applications , 2014, 2014 Workshop on Exascale MPI at Supercomputing Conference.

[12]  Stephen L. Olivier,et al.  Toward an evolutionary task parallel integrated MPI + X programming model , 2015, PMAM@PPoPP.

[13]  Pavan Balaji,et al.  Improving concurrency and asynchrony in multithreaded MPI applications using software offloading , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[14]  Jean-Pierre Panziera,et al.  The BXI Interconnect Architecture , 2015, 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects.

[15]  Satoshi Matsuoka,et al.  MPI+Threads: runtime contention and remedies , 2015, PPOPP.

[16]  Dhabaleswar K. Panda,et al.  Adaptive and Dynamic Design for MPI Tag Matching , 2016, 2016 IEEE International Conference on Cluster Computing (CLUSTER).

[17]  William Gropp,et al.  Towards millions of communicating threads , 2016, EuroMPI.

[18]  Keith D. Underwood,et al.  Mitigating MPI Message Matching Misery , 2016, ISC.

[19]  Ron Brightwell,et al.  RMA-MT: A Benchmark Suite for Assessing MPI Multi-threaded RMA Performance , 2016, 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid).

[20]  Holger Fröning,et al.  Relaxations for High-Performance Message Passing on Massively Parallel SIMT Processors , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[21]  Kevin T. Pedretti,et al.  Characterizing MPI matching via trace-based simulation , 2018, Parallel Comput..

[22]  George Bosilca,et al.  A Survey of MPI Usage in the U.S. Exascale Computing Project , 2018 .

[23]  George Bosilca,et al.  A survey of MPI usage in the US exascale computing project , 2018, Concurr. Comput. Pract. Exp..