A multigrain Delaunay mesh generation method for multicore SMT-based architectures

Given the proliferation of layered, multicore- and SMT-based architectures, it is imperative to deploy and evaluate important, multi-level, scientific computing codes, such as meshing algorithms, on these systems. We focus on Parallel Constrained Delaunay Mesh (PCDM) generation. We exploit coarse-grain parallelism at the subdomain level, medium-grain at the cavity level and fine-grain at the element level. This multi-grain data parallel approach targets clusters built from commercially available SMTs and multicore processors. The exploitation of the coarser degree of granularity facilitates scalability both in terms of execution time and problem size on loosely-coupled clusters. The exploitation of medium-grain parallelism allows performance improvement at the single node level. Our experimental evaluation shows that the first generation of SMT cores is not capable of taking advantage of fine-grain parallelism in PCDM. Many of our experimental findings with PCDM extend to other adaptive and irregular multigrain parallel algorithms as well.

[1]  Gary L. Miller,et al.  A Delaunay based numerical method for three dimensions: generation, formulation, and partition , 1995, STOC '95.

[2]  Guy E. Blelloch,et al.  Developing a practical projection-based parallel Delaunay algorithm , 1996, SCG '96.

[3]  Daniel Kressner,et al.  Block variants of Hammarling's method for solving Lyapunov equations , 2008, TOMS.

[4]  Andrey N. Chernikov,et al.  Parallel Guaranteed Quality Planar Delaunay Mesh Generation by Concurrent Point Insertion , 2004 .

[5]  Andrey N. Chernikov,et al.  Parallel Guaranteed Quality Delaunay Uniform Mesh Refinement , 2006, SIAM J. Sci. Comput..

[6]  Mark S. Shephard,et al.  Parallel three-dimensional mesh generation , 1994 .

[7]  Jonathan Richard Shewchuk,et al.  Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator , 1996, WACG.

[8]  Gerhard Globisch On an Automatically Parallel Generation Technique for Tetrahedral Meshes , 1995, Parallel Comput..

[9]  Nikos Chrisochoides,et al.  Parallel Delaunay mesh generation kernel , 2003 .

[10]  Paul-Louis George,et al.  Delaunay triangulation and meshing : application to finite elements , 1998 .

[11]  Keshav Pingali,et al.  A load balancing framework for adaptive and asynchronous applications , 2004, IEEE Transactions on Parallel and Distributed Systems.

[12]  Andrey N. Chernikov,et al.  Algorithm 872: Parallel 2D constrained Delaunay mesh generation , 2008, TOMS.

[13]  Halit Nebi Gürsoy,et al.  Shape interrogation by medial axis transform for automated analysis , 1989 .

[14]  R. Kikinis,et al.  Toward Real-Time Image Guided Neurosurgery Using Distributed and Grid Computing , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[15]  Kunle Olukotun,et al.  Programming with transactional coherence and consistency (TCC) , 2004, ASPLOS XI.

[16]  Luiz André Barroso,et al.  Piranha: a scalable architecture based on single-chip multiprocessing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[17]  D. Marr,et al.  Hyper-Threading Technology Architecture and MIcroarchitecture , 2002 .

[18]  Guy E. Blelloch,et al.  Design and Implementation of a Practical Parallel Delaunay Algorithm , 1999, Algorithmica.

[19]  George Em Karniadakis,et al.  Nodes, modes and flow codes , 1993 .

[20]  Adrian Bowyer,et al.  Computing Dirichlet Tessellations , 1981, Comput. J..

[21]  Roy Williams,et al.  Adaptive Parallel Meshes with Complex Geometry , 1991 .

[22]  Clemens Kadow Adaptive Dynamic Projection-Based Partitioning for Parallel Delaunay Mesh Generation Algorithms , 2003 .

[23]  Nigel P. Weatherill,et al.  Distributed parallel Delaunay mesh generation , 1999 .

[24]  Aslak Tveito,et al.  Numerical solution of partial differential equations on parallel computers , 2006 .

[25]  Mark T. Jones,et al.  Parallel algorithms for the adaptive refinement and partitioning of unstructured meshes , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.

[26]  Maged M. Michael,et al.  Simple, fast, and practical non-blocking and blocking concurrent queue algorithms , 1996, PODC '96.

[27]  A. Peirce Computer Methods in Applied Mechanics and Engineering , 2010 .

[28]  Xiaoning Ding,et al.  Multigrain parallel Delaunay Mesh generation: challenges and opportunities for multithreaded architectures , 2005, ICS '05.

[29]  Jim Ruppert,et al.  A Delaunay Refinement Algorithm for Quality 2-Dimensional Mesh Generation , 1995, J. Algorithms.

[30]  José E. Moreira,et al.  Dissecting Cyclops: a detailed analysis of a multithreaded architecture , 2003, CARN.

[31]  S. Dong,et al.  Flow past a stationary and moving cylinder: DNS at Re=10,000 , 2004, 2004 Users Group Conference (DOD_UGC'04).

[32]  Nikos Chrisochoides,et al.  Guaranteed: quality parallel delaunay refinement for restricted polyhedral domains , 2002, SCG '02.

[33]  Balaram Sinharoy,et al.  IBM Power5 chip: a dual-core multithreaded processor , 2004, IEEE Micro.

[34]  L. Paul Chew,et al.  Guaranteed-Quality Triangular Meshes , 1989 .

[35]  D. F. Watson Computing the n-Dimensional Delaunay Tesselation with Application to Voronoi Polytopes , 1981, Comput. J..

[36]  Nikos Chrisochoides,et al.  Algorithm 870: A static geometric Medial Axis domain decomposition in 2D Euclidean space , 2008, TOMS.

[37]  Microsystems Sun UltraSPARC IV Processor Architecture Overview , 2004 .

[38]  J. Shewchuk,et al.  Delaunay refinement mesh generation , 1997 .

[39]  Andrey N. Chernikov,et al.  Practical and efficient point insertion scheduling method for parallel guaranteed quality delaunay refinement , 2004, ICS '04.

[40]  Rainald Löhner,et al.  Parallel Advancing Front Grid Generation , 1999, IMR.

[41]  Byron W. Hanks Proceedings of the 14th International Meshing Roundtable , 2005 .

[42]  L. Paul Chew,et al.  Parallel Constrained Delaunay Meshing , 2007 .

[43]  Gerhard Globisch PARMESH - A Parallel Mesh Generator , 1995, Parallel Comput..

[44]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[45]  Jonathan Richard Shewchuk,et al.  Delaunay refinement algorithms for triangular mesh generation , 2002, Comput. Geom..

[46]  Andrey N. Chernikov,et al.  Parallel 2 D Constrained Delaunay Mesh Generation , 2022 .

[47]  L. Chew,et al.  Using Transactions in Delaunay Mesh Generation , 2006 .

[48]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[49]  L. Oliker,et al.  Parallelization of a Dynamic Unstructured Application using Three Leading Paradigms , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[50]  Dinesh Manocha,et al.  Computing the medial axis of a polyhedron reliably and efficiently , 2000 .

[51]  Nikos Chrisochoides,et al.  Delaunay Decoupling Method for Parallel Guaranteed Quality Planar Mesh Refinement , 2005, SIAM J. Sci. Comput..

[52]  Jerome Galtier,et al.  Prepartitioning as a way to mesh subdomains in parallel , 1997 .

[53]  Andrey N. Chernikov,et al.  Algorithm, software, and hardware optimizations for Delaunay mesh generation on simultaneous multithreaded architectures , 2009, J. Parallel Distributed Comput..

[54]  Daniel Pizarro-Perez,et al.  Parallel Refinement of Tetrahedral Meshes Using Terminal-Edge Bisection Algorithm , 2004, IMR.

[55]  Nikos Chrisochoides,et al.  Parallel Programming Environment for Mesh Generation † , 2002 .