Controlling Unstructured Mesh Partitions for Massively Parallel Simulations

Parallel simulations at extreme scale require that the mesh is distributed across a large number of processors with equal work load and minimum interpart communications. A number of algorithms have been developed to meet these goals, e.g., graph/hypergraph and coordinate-based methods. However, the global implementation of current approaches can fail on very large core counts, which is resolved by combining global and local partitioning using multiple parts per processor. The other limitation of graph/hypergraph-based partitioning is that it uses one type of mesh entity as graph nodes; thus, the balance of other mesh entities may not be optimal. In the case of three-dimensional (3-D) linear finite element analysis, it is common to select mesh regions (elements) as partition objects. In current examples, the regions are well balanced up to 163,840 parts for a 1.07 billion element mesh, while the vertices have an imbalance which is as high as 19.52%. Two methods are developed that work in conjunction with graph/hypergraph-based procedures to provide improved partitions. Example computations executed on an IBM Blue Gene/P system using up to 163,840 cores demonstrate the usefulness of the procedures, particularly for time-critical calculations where individual cores may be lightly loaded in terms of the number of mesh entities per core. The algorithms presented in this paper reduced the vertex imbalance from 17.8% to 4.97% for a partition with 131,072 parts and accelerated the equation solution phase of the finite element analysis by 10.4%.

[1]  William J. Knottenbelt,et al.  Par kway 2.0: A Parallel Multilevel Hypergraph Partitioning Tool , 2004, ISCIS.

[2]  Rob H. Bisseling,et al.  Parallel hypergraph partitioning for scientific computing , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[3]  Onkar Sahni,et al.  Scalable implicit finite element solver for massively parallel processing with demonstration to 160K cores , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[4]  Courtenay T. Vaughan,et al.  Zoltan data management services for parallel dynamic applications , 2002, Comput. Sci. Eng..

[5]  Ümit V. Çatalyürek,et al.  Hypergraph-Partitioning-Based Decomposition for Parallel Sparse-Matrix Vector Multiplication , 1999, IEEE Trans. Parallel Distributed Syst..

[6]  G. Hulbert,et al.  A generalized-α method for integrating the filtered Navier–Stokes equations with a stabilized finite element method , 2000 .

[7]  Charles A. Taylor,et al.  Outflow boundary conditions for three-dimensional finite element modeling of blood flow and pressure in arteries , 2006 .

[8]  Min Zhou Petascale adaptive computational fluid dynamics , 2009 .

[9]  Horst D. Simon,et al.  Partitioning of unstructured problems for parallel processing , 1991 .

[10]  Vipin Kumar,et al.  A Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering , 1998, J. Parallel Distributed Comput..

[11]  Jean-François Remacle,et al.  Parallel Algorithm Oriented Mesh Database , 2002, Engineering with Computers.

[12]  Bruce Hendrickson,et al.  A Multi-Level Algorithm For Partitioning Graphs , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[13]  J. Womersley Method for the calculation of velocity, rate of flow and viscous drag in arteries when the pressure gradient is known , 1955, The Journal of physiology.

[14]  Vipin Kumar,et al.  Multilevel Algorithms for Multi-Constraint Graph Partitioning , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[15]  Mark S. Shephard,et al.  Flexible Distributed Mesh Data Structure for Parallel Adaptive Analysis , 2009 .

[16]  Charles A. Taylor,et al.  Efficient anisotropic adaptive discretization of the cardiovascular system , 2006 .

[17]  George Karypis,et al.  Multilevel Hypergraph Partitioning , 2003 .

[18]  Ralf Diekmann,et al.  Quality matching and local improvement for multilevel graph-partitioning , 2000, Parallel Comput..

[19]  Karen Dragon Devine,et al.  Partitioning and Dynamic Load Balancing for the Numerical Solution of Partial Differential Equations , 2006 .

[20]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[21]  Alex Pothen,et al.  PARTITIONING SPARSE MATRICES WITH EIGENVECTORS OF GRAPHS* , 1990 .

[22]  Curt Jones,et al.  A Heuristic for Reducing Fill-In in Sparse Matrix Factorization , 1993, PPSC.

[23]  Vipin Kumar,et al.  Parallel static and dynamic multi‐constraint graph partitioning , 2002, Concurr. Comput. Pract. Exp..

[24]  Mark S. Shephard,et al.  Efficient distributed mesh data structure for parallel automated adaptive analysis , 2006, Engineering with Computers.