High-Quality Hierarchical Process Mapping

Partitioning graphs into blocks of roughly equal size such that few edges run between blocks is a frequently needed operation when processing graphs on a parallel computer. When a topology of a distributed system is known an important task is then to map the blocks of the partition onto the processors such that the overall communication cost is reduced. We present novel multilevel algorithms that integrate graph partitioning and process mapping. Important ingredients of our algorithm include fast label propagation, more localized local search, initial partitioning, as well as a compressed data structure to compute processor distances without storing a distance matrix. Experiments indicate that our algorithms speed up the overall mapping process and, due to the integrated multilevel approach, also find much better solutions in practice. For example, one configuration of our algorithm yields better solutions than the previous state-of-the-art in terms of mapping quality while being a factor 62 faster. Compared to the currently fastest iterated multilevel mapping algorithm Scotch, we obtain 16% better solutions while investing slightly more running time.

[1]  Bora Uçar,et al.  Multilevel Algorithms for Acyclic Partitioning of Directed Acyclic Graphs , 2019, SIAM J. Sci. Comput..

[2]  Christian Schulz,et al.  Memetic multilevel hypergraph partitioning , 2017, GECCO.

[3]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[4]  Guillaume Mercier,et al.  Towards an Efficient Process Placement Policy for MPI Applications in Multicore Environments , 2009, PVM/MPI.

[5]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[6]  Christian Schulz,et al.  Evolutionary multi-level acyclic graph partitioning , 2018, Journal of Heuristics.

[7]  David A. Bader,et al.  Benchmarking for Graph Clustering and Partitioning , 2014, Encyclopedia of Social Network Analysis and Mining.

[8]  J. L. Traff Implementing the MPI Process Topology Mechanism , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[9]  Richard Vynne Southwell,et al.  Stress-calculation in frameworks by the method of "systematic relaxation of constraints"—I and II , 1935, Proceedings of the Royal Society of London. Series A - Mathematical and Physical Sciences.

[10]  Bruce Hendrickson,et al.  A Multi-Level Algorithm For Partitioning Graphs , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[11]  R. M. Mattheyses,et al.  A Linear-Time Heuristic for Improving Network Partitions , 1982, 19th Design Automation Conference.

[12]  Chris Walshaw,et al.  Multilevel mesh partitioning for heterogeneous communication networks , 2001, Future Gener. Comput. Syst..

[13]  Peter Sanders,et al.  Engineering Multilevel Graph Partitioning Algorithms , 2010, ESA.

[14]  Chris Walshaw,et al.  A Combined Evolutionary Search and Multilevel Optimisation Approach to Graph-Partitioning , 2004, J. Glob. Optim..

[15]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[16]  Erich Schikuta,et al.  Data Allocation Based on Evolutionary Data Popularity Clustering , 2018, ICCS.

[17]  Peter Sanders,et al.  Recent Advances in Graph Partitioning , 2013, Algorithm Engineering.

[18]  Emmanuel Jeannot,et al.  Improving MPI Applications Performance on Multicore Clusters with Rank Reordering , 2011, EuroMPI.

[19]  Roland Glantz,et al.  Algorithms for Mapping Parallel Processes onto Grid and Torus Architectures , 2014, 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[20]  Peter Sanders,et al.  Think Locally, Act Globally: Highly Balanced Graph Partitioning , 2013, SEA.

[21]  José E. Moreira,et al.  Topology Mapping for Blue Gene/L Supercomputer , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[22]  Jesper Larsson Träff,et al.  Direct graph k-partitioning with a Kernighan-Lin like heuristic , 2006, Oper. Res. Lett..

[23]  Roland Glantz,et al.  Topology-induced Enhancement of Mappings , 2018, ICPP.

[24]  Chris Walshaw,et al.  Mesh Partitioning: A Multilevel Balancing and Refinement Algorithm , 2000, SIAM J. Sci. Comput..

[25]  David S. Johnson,et al.  Some simplified NP-complete problems , 1974, STOC '74.

[26]  Peter Sanders,et al.  High quality graph partitioning , 2012, Graph Partitioning and Graph Clustering.

[27]  Christian Schulz,et al.  Orca Reduction and ContrAction Graph Clustering , 2009, AAIM.

[28]  Carey E. Priebe,et al.  Fast Approximate Quadratic Programming for Graph Matching , 2015, PloS one.

[29]  B. Brandfass,et al.  Rank reordering for MPI communication optimization , 2013 .

[30]  Teofilo F. Gonzalez,et al.  P-Complete Approximation Problems , 1976, J. ACM.

[31]  Peter Sanders,et al.  k-way Hypergraph Partitioning via n-Level Recursive Bisection , 2015, ALENEX.

[32]  Peter Sanders,et al.  Partitioning Complex Networks via Size-Constrained Clustering , 2014, SEA.

[33]  Christian Schulz,et al.  Graph Partitioning: Formulations and Applications to Big Data , 2019, Encyclopedia of Big Data Technologies.

[34]  Christian Schulz,et al.  Drawing Large Graphs by Multilevel Maxent-Stress Optimization , 2015, IEEE Transactions on Visualization and Computer Graphics.

[35]  Peter Sanders,et al.  Advanced Multilevel Node Separator Algorithms , 2015, SEA.

[36]  Jean Roman,et al.  SCOTCH: A Software Package for Static Mapping by Dual Recursive Bipartitioning of Process and Architecture Graphs , 1996, HPCN Europe.

[37]  William W. Hager,et al.  A multilevel bilinear programming algorithm for the vertex separator problem , 2018, Comput. Optim. Appl..

[38]  Takao Hatazaki,et al.  Rank Reordering Strategy for MPI Topology Creation Functions , 1998, PVM/MPI.

[39]  Jean-Loup Guillaume,et al.  Fast unfolding of community hierarchies in large networks , 2008, ArXiv.

[40]  Jesper Larsson Träff,et al.  Better Process Mapping and Sparse Quadratic Assignment , 2017, SEA.