Towards Exascale Parallel Delaunay Mesh Generation

Mesh generation is a critical component for many (bio-)engineering applications. However, parallel mesh generation codes, which are essential for these applications to take the fullest advantage of the high-end computing platforms, belong to the broader class of adaptive and irregular problems, and are among the most complex, challenging, and labor intensive to develop and maintain. As a result, parallel mesh generation is one of the last applications to be installed on new parallel architectures. In this paper we present a way to remedy this problem for new highly-scalable architectures. We present a multi-layered tetrahedral/triangular mesh generation approach capable of delivering and sustaining close to 1018 of concurrent work units. We achieve this by leveraging concurrency at different granularity levels using a hybrid algorithm, and by carefully matching these levels to the hierarchy of the hardware architecture. This paper makes two contributions: (1) a new evolutionary path for developing multi-layered parallel mesh generation codes capable of increasing the concurrency of the state-of-the-art parallel mesh generation methods by at least 10 orders of magnitude and (2) a new abstraction for multi-layered runtime systems that target parallel mesh generation codes, to efficiently orchestrate intra- and inter-layer data movement and load balancing for current and emerging multi-layered architectures with deep memory and network hierarchies.

[1]  Nikos Chrisochoides,et al.  Algorithm 870: A static geometric Medial Axis domain decomposition in 2D Euclidean space , 2008, TOMS.

[2]  Courtenay T. Vaughan,et al.  Design of dynamic load-balancing tools for parallel applications , 2000, ICS '00.

[3]  Klaus Gärtner,et al.  Meshing Piecewise Linear Complexes by Constrained Delaunay Tetrahedralizations , 2005, IMR.

[4]  Guy E. Blelloch,et al.  Developing a practical projection-based parallel Delaunay algorithm , 1996, SCG '96.

[5]  L. Paul Chew,et al.  Guaranteed-Quality Triangular Meshes , 1989 .

[6]  D. F. Watson Computing the n-Dimensional Delaunay Tesselation with Application to Voronoi Polytopes , 1981, Comput. J..

[7]  Keshav Pingali,et al.  Optimistic parallelism requires abstractions , 2009, CACM.

[8]  Timothy J. Tautges,et al.  Interoperable mesh and geometry tools for advanced petascale simulations , 2007 .

[9]  Paul-Louis George,et al.  Delaunay triangulation and meshing : application to finite elements , 1998 .

[10]  Keshav Pingali,et al.  A load balancing framework for adaptive and asynchronous applications , 2004, IEEE Transactions on Parallel and Distributed Systems.

[11]  Kevin J. Barker,et al.  An Evaluation of a Framework for the Dynamic Load Balancing of Highly Adaptive and Irregular Parallel Applications , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[12]  Andrey N. Chernikov,et al.  Algorithm 872: Parallel 2D constrained Delaunay mesh generation , 2008, TOMS.

[13]  J. Shewchuk,et al.  Streaming computation of Delaunay triangulations , 2006, SIGGRAPH '06.

[14]  Andrey N. Chernikov,et al.  Practical and efficient point insertion scheduling method for parallel guaranteed quality delaunay refinement , 2004, ICS '04.

[15]  Byron W. Hanks Proceedings of the 14th International Meshing Roundtable , 2005 .

[16]  Michael F. Spear,et al.  Delaunay Triangulation with Transactions and Barriers , 2007, 2007 IEEE 10th International Symposium on Workload Characterization.

[17]  Wei Chen,et al.  Materials integrity in microsystems: a framework for a petascale predictive-science-based multiscale modeling and simulation system , 2008 .

[18]  Nikos Chrisochoides,et al.  Graded Delaunay Decoupling Method for Parallel Guaranteed Quality Planar Mesh Generation , 2008, SIAM J. Sci. Comput..

[19]  Jonathan Richard Shewchuk,et al.  Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator , 1996, WACG.

[20]  Laxmikant V. Kalé,et al.  CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.

[21]  Omar Ghattas,et al.  Parallel delaunay refinement mesh generation , 2004 .

[22]  Aslak Tveito,et al.  Numerical solution of partial differential equations on parallel computers , 2006 .

[23]  Seth Copen Goldstein,et al.  Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[24]  Bryan Carpenter,et al.  ARMCI: A Portable Remote Memory Copy Libray for Ditributed Array Libraries and Compiler Run-Time Systems , 1999, IPPS/SPDP Workshops.

[25]  S. Dong,et al.  Flow past a stationary and moving cylinder: DNS at Re=10,000 , 2004, 2004 Users Group Conference (DOD_UGC'04).

[26]  Jonathan Richard Shewchuk,et al.  Delaunay refinement algorithms for triangular mesh generation , 2002, Comput. Geom..

[27]  K.J. Barker,et al.  An Evaluation of a Framework for the Dynamic Load Balancing of Highly Adaptive and Irregular Parallel Applications , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[28]  Joachim Sch NETGEN An advancing front 2D/3D-mesh generator based on abstract rules , 1997 .

[29]  Roy A. Walters,et al.  Coastal ocean models : two useful finite element methods , 2005 .

[30]  Daniel Kressner,et al.  Block variants of Hammarling's method for solving Lyapunov equations , 2008, TOMS.

[31]  Andrey N. Chernikov,et al.  Parallel Guaranteed Quality Delaunay Uniform Mesh Refinement , 2006, SIAM J. Sci. Comput..

[32]  Xiaoning Ding,et al.  Multigrain parallel Delaunay Mesh generation: challenges and opportunities for multithreaded architectures , 2005, ICS '05.

[33]  Kevin J. Barker,et al.  Mobile object layer: a runtime substrate for parallel adaptive and irregular computations , 2000 .

[34]  Ümit V. Çatalyürek,et al.  Getting Started with Zoltan: A Short Tutorial , 2009, Combinatorial Scientific Computing.

[35]  Andrey N. Chernikov,et al.  Out-of-Core Parallel Delaunay Mesh Generation ∗ Extended Abstract , .

[36]  Andriy Fedorov,et al.  Location management in object-based distributed computing , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).

[37]  Guy E. Blelloch,et al.  Design and Implementation of a Practical Parallel Delaunay Algorithm , 1999, Algorithmica.

[38]  Scott A. Mitchell,et al.  Quality Mesh Generation in Higher Dimensions , 2000, SIAM J. Comput..

[39]  Kirk L. Johnson,et al.  CRL: high-performance all-software distributed shared memory , 1995, SOSP.

[40]  A. Kot,et al.  Parallel Out-of-Core Constrained Delaunay Mesh Generation , 2005, 2005 IEEE Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications.

[41]  Georg Stadler,et al.  Towards adaptive mesh PDE simulations on petascale computers , 2008 .

[42]  Nikos Chrisochoides,et al.  Guaranteed: quality parallel delaunay refinement for restricted polyhedral domains , 2002, SCG '02.

[43]  Andrey N. Chernikov,et al.  Effective out-of-core parallel Delaunay mesh refinement using off-the-shelf software , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[44]  Adrian Bowyer,et al.  Computing Dirichlet Tessellations , 1981, Comput. J..

[45]  Andrey N. Chernikov,et al.  Parallel 2 D Constrained Delaunay Mesh Generation , 2022 .

[46]  Keshav Pingali,et al.  Optimistic parallelism benefits from data partitioning , 2008, ASPLOS.

[47]  Andrey N. Chernikov,et al.  Three-dimensional delaunay refinement for multi-core processors , 2008, ICS '08.

[48]  Nikos Chrisochoides,et al.  Delaunay Decoupling Method for Parallel Guaranteed Quality Planar Mesh Refinement , 2005, SIAM J. Sci. Comput..