Multi-Jagged: A Scalable Parallel Spatial Partitioning Algorithm

Geometric partitioning is fast and effective for load-balancing dynamic applications, particularly those requiring geometric locality of data (particle methods, crash simulations). We present, to our knowledge, the first parallel implementation of a multidimensional-jagged geometric partitioner. In contrast to the traditional recursive coordinate bisection algorithm (RCB), which recursively bisects subdomains perpendicular to their longest dimension until the desired number of parts is obtained, our algorithm does recursive multi-section with a given number of parts in each dimension. By computing multiple cut lines concurrently and intelligently deciding when to migrate data while computing the partition, we minimize data movement compared to efficient implementations of recursive bisection. We demonstrate the algorithm's scalability and quality relative to the RCB implementation in Zoltan on both real and synthetic datasets. Our experiments show that the proposed algorithm performs and scales better than RCB in terms of run-time without degrading the load balance. Our implementation partitions 24 billion points into 65,536 parts within a few seconds and exhibits near perfect weak scaling up to 6K cores.

[1]  Jonathan J. Hu,et al.  ML 5.0 Smoothed Aggregation Users's Guide , 2006 .

[2]  Jonathan J. Hu,et al.  Design considerations for a flexible multigrid preconditioning library , 2012, Sci. Program..

[3]  Yijie Han,et al.  Mapping a Chain Task to Chained Processors , 1992, Inf. Process. Lett..

[4]  J. Tinsley Oden,et al.  Problem decomposition for adaptive hp finite element methods , 1995 .

[5]  B. Nour-Omid,et al.  A study of the factorization fill‐in for a parallel implementation of the finite element method , 1994 .

[6]  Boleslaw K. Szymanski,et al.  Adaptive Local Refinement with Octree Load Balancing for the Parallel Solution of Three-Dimensional Conservation Laws , 1997, J. Parallel Distributed Comput..

[7]  Sivasankaran Rajamanickam,et al.  Towards Extreme-Scale Simulations for Low Mach Fluids with Second-Generation Trilinos , 2014, Parallel Process. Lett..

[8]  Ümit V. Çatalyürek,et al.  Load-balancing spatially located computations using rectangular partitions , 2011, J. Parallel Distributed Comput..

[9]  Rob H. Bisseling,et al.  Parallel hypergraph partitioning for scientific computing , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[10]  Sivasankaran Rajamanickam,et al.  Towards Extreme-Scale Simulations with Next-Generation Trilinos: A Low Mach Fluid Application Case Study , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[11]  Jesse David Thomas Sierra/solid mechanics 4.22 user's guide. , 2011 .

[12]  Ümit V. Çatalyürek,et al.  The Zoltan and Isorropia parallel toolkits for combinatorial scientific computing: Partitioning, ordering and coloring , 2012, Sci. Program..

[13]  Bruce Hendrickson,et al.  A Multi-Level Algorithm For Partitioning Graphs , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[14]  Ümit V. Çatalyürek,et al.  Hypergraph-Partitioning-Based Decomposition for Parallel Sparse-Matrix Vector Multiplication , 1999, IEEE Trans. Parallel Distributed Syst..

[15]  Cevdet Aykanat,et al.  Sparse matrix decomposition with optimal load balancing , 1997, Proceedings Fourth International Conference on High-Performance Computing.

[16]  Michael S. Warren,et al.  A parallel hashed oct-tree N-body algorithm , 1993, Supercomputing '93. Proceedings.

[17]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..

[18]  Mehmet Deveci,et al.  Zoltan2: Next-Generation Combinatorial Toolkit. , 2012 .

[19]  Tor Sørevik,et al.  Partitioning an Array onto a Mesh of Processors , 1996, PARA.

[20]  Shahid H. Bokhari,et al.  A Partitioning Strategy for Nonuniform Problems on Multiprocessors , 1987, IEEE Transactions on Computers.

[21]  Horst D. Simon,et al.  Partitioning of unstructured problems for parallel processing , 1991 .

[22]  Tamara G. Kolda,et al.  Graph partitioning models for parallel computing , 2000, Parallel Comput..

[23]  Brian W. Kernighan,et al.  A proper model for the partitioning of electrical circuits , 1972, DAC '72.

[24]  Michelangelo Grigni,et al.  On the Complexity of the Generalized Block Distribution , 1996, IRREGULAR.

[25]  Sandia Report,et al.  MiniGhost: A Miniapp for Exploring Boundary Exchange Strategies Using Stencil Computations in Scientific Parallel Computing , 2012 .

[26]  Stephen L. Olivier,et al.  Exploiting Geometric Partitioning in Task Mapping for Parallel Computers , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[27]  Bengt Aspvall,et al.  Approximations for the general block distribution of a matrix , 1998, Theor. Comput. Sci..

[28]  Yannis Kallinderis,et al.  Octree partitioning of hybrid grids for parallel adaptive viscous flow simulations , 1998 .

[29]  Joel H. Saltz,et al.  Experimental evaluation of efficient sparse matrix distributions , 1996, ICS '96.

[30]  Cevdet Aykanat,et al.  Fast optimal load balancing algorithms for 1D partitioning , 2004, J. Parallel Distributed Comput..

[31]  David M. Nicol,et al.  Rectilinear Partitioning of Irregular Data Parallel Computations , 1994, J. Parallel Distributed Comput..

[32]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[33]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..