Implementing the MPI process topology mechanism

The topology functionality of the Message Passing Interface (MPI) provides a portable, architecture-independent means for adapting application programs to the communication architecture of the target hardware. However, current MPI implementations rarely go beyond the most trivial implementation, and simply performs no process remapping.We discuss the potential of the topology mechanism for systems with a hierarchical communication architecture like clusters of SMP nodes. The MPI topology functionality is a weak mechanism, and we argue about some of its shortcomings. We formulate the topology optimization problem as a graph embedding problem, and show that for hierarchical systems it can be solved by graph partitioning. We state the properties of a new heuristic for solving both the embedding problem and the "easier" graph partitioning problem.The graph partitioning based framework has been fully implemented in MPI/SX for the NEC SX-series of parallel vector computers. MPI/SX is thus one of very few MPI implementations with a non-trivial topology functionality. On a 4 node NEC SX-6 significant communication performance improvements are achieved with synthetic MPI benchmarks.

[1]  Chris Walshaw,et al.  Mesh Partitioning: A Multilevel Balancing and Refinement Algorithm , 2000, SIAM J. Sci. Comput..

[2]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..

[3]  John R. Gilbert,et al.  A parallel graph partitioning algorithm for a message-passing multiprocessor , 1987, International Journal of Parallel Programming.

[4]  Takao Hatazaki,et al.  Rank Reordering Strategy for MPI Topology Creation Functions , 1998, PVM/MPI.

[5]  William Gropp,et al.  Mpi - The Complete Reference: Volume 2, the Mpi Extensions , 1998 .

[6]  Bruce Hendrickson,et al.  A Multi-Level Algorithm For Partitioning Graphs , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[7]  Shang-Hua Teng,et al.  How Good is Recursive Bisection? , 1997, SIAM J. Sci. Comput..

[8]  Vipin Kumar,et al.  A Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering , 1998, J. Parallel Distributed Comput..

[9]  R. M. Mattheyses,et al.  A Linear-Time Heuristic for Improving Network Partitions , 1982, 19th Design Automation Conference.

[10]  John R. Gilbert,et al.  A parallel graph partitioning algorithm for a message-passing multiprocessor , 1987, International journal of parallel programming.

[11]  John E. Savage,et al.  Parallelism in Graph-Partitioning , 1991, J. Parallel Distributed Comput..

[12]  Chris Walshaw,et al.  Parallel optimisation algorithms for multilevel mesh partitioning , 2000, Parallel Comput..

[13]  Jack Dongarra,et al.  MPI: The Complete Reference , 1996 .

[14]  Jack Dongarra,et al.  MPI - The Complete Reference: Volume 1, The MPI Core , 1998 .

[15]  George Karypis,et al.  Multilevel k-way Partitioning Scheme for Irregular Graphs , 1998, J. Parallel Distributed Comput..