Accelerating MPI Message Matching by a Data Clustering Strategy

Message Passing Interface (MPI) is one of the most popular parallel programming models for high-performance computing. In MPI, message matching operations are in the critical path of communication, which could adversely affect the application performance. MPI libraries typically use a linked list data structure to maintain early posted receives and unexpected messages for matching operations. However, they perform poorly at scale due to long message queue traversals. In this paper, we propose an MPI message matching mechanism based on K-means clustering that considers the behavior of the applications to categorize the communicating peers into clusters, and assign a dedicated queue to each cluster. The clustering is done based on the number of queue elements each communicating process adds to the posted receive or unexpected message queue at runtime. The proposed approach provides an opportunity to parallelize the search operation for different processes based on the application‘s message queue characteristic. The experimental study with real applications confirm that the proposed message matching approach reduces the number of queue traversals, the queue search time, and the application runtime.

[1]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[2]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[3]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[4]  Steve Plimpton,et al.  Fast parallel algorithms for short-range molecular dynamics , 1993 .

[5]  Steven J. Plimpton,et al.  Particle{Mesh Ewald and rRESPA for Parallel Molecular Dynamics Simulations , 1997 .

[6]  V. E. Henson,et al.  BoomerAMG: a parallel algebraic multigrid solver and preconditioner , 2002 .

[7]  Keith D. Underwood,et al.  An analysis of NIC resource usage for offloading MPI , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[8]  Keith D. Underwood,et al.  A preliminary analysis of the MPI queue characterisitics of several applications , 2005, 2005 International Conference on Parallel Processing (ICPP'05).

[9]  Karl S. Hemmert,et al.  A hardware acceleration unit for MPI queue processing , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[10]  Kevin T. Pedretti,et al.  Instrumentation and Analysis of MPI Queue Times on the SeaStar High-Performance Network , 2008, 2008 Proceedings of 17th International Conference on Computer Communications and Networks.

[11]  Richard L. Graham,et al.  Characteristics of the Unexpected Message Queue of MPI Applications , 2010, EuroMPI.

[12]  Glenn P. Forney,et al.  Fire Dynamics Simulator Users Guide, Sixth Edition , 2013 .

[13]  Ahmad Afsahi,et al.  A fast and resource-conscious MPI message queue mechanism for large-scale jobs , 2014, Future Gener. Comput. Syst..

[14]  Brian W. Barrett,et al.  The Portals 4.3 Network Programming Interface , 2014 .

[15]  Dhabaleswar K. Panda,et al.  Adaptive and Dynamic Design for MPI Tag Matching , 2016, 2016 IEEE International Conference on Cluster Computing (CLUSTER).

[16]  Keith D. Underwood,et al.  Mitigating MPI Message Matching Misery , 2016, ISC.

[17]  Holger Fröning,et al.  Relaxations for High-Performance Message Passing on Massively Parallel SIMT Processors , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).