The Zoltan and Isorropia parallel toolkits for combinatorial scientific computing: Partitioning, ordering and coloring

Partitioning and load balancing are important problems in scientific computing that can be modeled as combinatorial problems using graphs or hypergraphs. The Zoltan toolkit was developed primarily for partitioning and load balancing to support dynamic parallel applications, but has expanded to support other problems in combinatorial scientific computing, including matrix ordering and graph coloring. Zoltan is based on abstract user interfaces and uses callback functions. To simplify the use and integration of Zoltan with other matrix-based frameworks, such as the ones in Trilinos, we developed Isorropia as a Trilinos package, which supports most of Zoltan’s features via a matrix-based interface. In addition to providing an easy-to-use matrix-based interface to Zoltan, Isorropia also serves as a platform for additional matrix algorithms. In this paper, we give an overview of the Zoltan and Isorropia toolkits, their design, capabilities and use. We also show how Zoltan and Isorropia enable large-scale, parallel scientific simulations, and describe current and future development in the next-generation package Zoltan2.

[1]  J. Culberson Iterated Greedy Graph Coloring and the Difficulty Landscape , 1992 .

[2]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[3]  Ümit V. Çatalyürek,et al.  Hypergraph-Partitioning-Based Decomposition for Parallel Sparse-Matrix Vector Multiplication , 1999, IEEE Trans. Parallel Distributed Syst..

[4]  Michael A. Heroux,et al.  PyTrilinos: High-performance distributed-memory solvers for Python , 2006, TOMS.

[5]  Bora Uçar,et al.  On Two-Dimensional Sparse Matrix Partitioning: Models, Methods, and a Recipe , 2010, SIAM J. Sci. Comput..

[6]  Vipin Kumar,et al.  Parallel Multilevel Diffusion Algorithms for Repartitioning of Adaptive Meshes , 1997 .

[7]  Ümit V. Çatalyürek,et al.  Improving graph coloring on distributed-memory parallel computers , 2011, 2011 18th International Conference on High Performance Computing.

[8]  Bruce Hendrickson,et al.  A Multi-Level Algorithm For Partitioning Graphs , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[9]  Timothy A. Davis,et al.  A column approximate minimum degree ordering algorithm , 2000, TOMS.

[10]  David S. Johnson,et al.  Some Simplified NP-Complete Graph Problems , 1976, Theor. Comput. Sci..

[11]  Michael S. Warren,et al.  A parallel hashed oct-tree N-body algorithm , 1993, Supercomputing '93. Proceedings.

[12]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[13]  Curt Jones,et al.  A Heuristic for Reducing Fill-In in Sparse Matrix Factorization , 1993, PPSC.

[14]  Fredrik Manne A Parallel Algorithm for Computing the Extremal Eigenvalues of Very Large Sparse Matrices , 1998, PARA.

[15]  J. J. Moré,et al.  Estimation of sparse jacobian matrices and graph coloring problems , 1983 .

[16]  Mark Frederick Hoemmen,et al.  An Overview of Trilinos , 2003 .

[17]  Yousef Saad,et al.  ILUM: A Multi-Elimination ILU Preconditioner for General Sparse Matrices , 1996, SIAM J. Sci. Comput..

[18]  Madhav V. Marathe,et al.  Approximation Algorithms for Channel Assignment in Radio Networks , 1998 .

[19]  John D. Ramsdell,et al.  Estimation of Sparse Jacobian Matrices , 1983 .

[20]  Larry Carter,et al.  Combining Performance Aspects of Irregular Gauss-Seidel Via Sparse Tiling , 2002, LCPC.

[21]  Ümit V. Çatalyürek,et al.  Hypergraph Partitioning-Based Fill-Reducing Ordering for Symmetric Matrices , 2011, SIAM J. Sci. Comput..

[22]  E. Sturler,et al.  Probing methods for saddle-point problems. , 2006 .

[23]  Joseph E. Flaherty,et al.  A hierarchical partition model for adaptive finite element computation , 2000 .

[24]  E. Cuthill,et al.  Reducing the bandwidth of sparse symmetric matrices , 1969, ACM '69.

[25]  Michael A. Heroux,et al.  On the design of interfaces to sparse direct solvers , 2008, TOMS.

[26]  Tamara G. Kolda,et al.  Graph partitioning models for parallel computing , 2000, Parallel Comput..

[27]  H. Wilf,et al.  Direct Solutions of Sparse Network Equations by Optimally Ordered Triangular Factorization , 1967 .

[28]  Ümit V. Çatalyürek,et al.  A repartitioning hypergraph model for dynamic load balancing , 2009, J. Parallel Distributed Comput..

[29]  Michael T. Heath,et al.  Combinatorial Algorithms in Scientific Computing , 2001 .

[30]  Guillaume Mercier,et al.  hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.

[31]  Timothy A. Davis,et al.  Hypergraph-Based Unsymmetric Nested Dissection Ordering for Sparse LU Factorization , 2010, SIAM J. Sci. Comput..

[32]  Mark T. Jones,et al.  Scalable Iterative Solution of Sparse Linear Systems , 1994, Parallel Comput..

[33]  François Pellegrini,et al.  PT-Scotch: A tool for efficient parallel graph ordering , 2008, Parallel Comput..

[34]  Courtenay T. Vaughan,et al.  Zoltan data management services for parallel dynamic applications , 2002, Comput. Sci. Eng..

[35]  A. George Nested Dissection of a Regular Finite Element Mesh , 1973 .

[36]  Ümit V. Çatalyürek,et al.  Distributed-Memory Parallel Algorithms for Distance-2 Coloring and Related Problems in Derivative Computation , 2010, SIAM J. Sci. Comput..

[37]  Alex Pothen,et al.  What Color Is Your Jacobian? Graph Coloring for Computing Derivatives , 2005, SIAM Rev..

[38]  Rob H. Bisseling,et al.  Parallel hypergraph partitioning for scientific computing , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[39]  George Karypis,et al.  Parmetis parallel graph partitioning and sparse matrix ordering library , 1997 .

[40]  Igor L. Markov,et al.  Hypergraph Partitioning , 2011, Encyclopedia of Parallel Computing.

[41]  Bruce Hendrickson,et al.  Effective Sparse Matrix Ordering: Just Around the BEND , 1997, PPSC.

[42]  Ümit V. Çatalyürek,et al.  A framework for scalable greedy coloring on distributed-memory parallel computers , 2008, J. Parallel Distributed Comput..

[43]  Tony F. Chan,et al.  The Interface Probing Technique in Domain Decomposition , 1992, SIAM J. Matrix Anal. Appl..

[44]  I. Duff,et al.  The effect of ordering on preconditioned conjugate gradients , 1989 .

[45]  Shahid H. Bokhari,et al.  A Partitioning Strategy for Nonuniform Problems on Multiprocessors , 1987, IEEE Transactions on Computers.

[46]  Horst D. Simon,et al.  Partitioning of unstructured problems for parallel processing , 1991 .

[47]  Ümit V. Çatalyürek,et al.  Permuting Sparse Rectangular Matrices into Block-Diagonal Form , 2004, SIAM J. Sci. Comput..

[48]  Aravind Srinivasan,et al.  End-to-end packet-scheduling in wireless ad-hoc networks , 2004, SODA '04.

[49]  Sivasankaran Rajamanickam,et al.  ShyLU: A Hybrid-Hybrid Solver for Multicore Platforms , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[50]  Ümit V. Çatalyürek,et al.  Distributed-Memory Parallel Algorithms for Matching and Coloring , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[51]  Bert J. Debusschere,et al.  Ovis-2: A robust distributed architecture for scalable RAS , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[52]  B. Nour-Omid,et al.  A study of the factorization fill‐in for a parallel implementation of the finite element method , 1994 .