论文信息 - Multigrain parallel Delaunay Mesh generation: challenges and opportunities for multithreaded architectures

Multigrain parallel Delaunay Mesh generation: challenges and opportunities for multithreaded architectures

Given the importance of parallel mesh generation in large-scale scientific applications and the proliferation of multilevel SMT-based architectures, it is imperative to obtain insight on the interaction between meshing algorithms and these systems. We focus on Parallel Constrained Delaunay Mesh (PCDM) generation. We exploit coarse-grain parallelism at the subdomain level and fine-grain at the element level. This multigrain data parallel approach targets clusters built from low-end, commercially available SMTs. Our experimental evaluation shows that current SMTs are not capable of executing fine-grain parallelism in PCDM. However, experiments on a simulated SMT indicate that with modest hardware support it is possible to exploit fine-grain parallelism opportunities. The exploitation of fine-grain parallelism results to higher performance than a pure MPI implementation and closes the gap between the performance of PCDM and the state-of-the-art sequential mesher on a single physical processor. Our findings extend to other adaptive and irregular multigrain, parallel algorithms.

[1] M. Shephard,et al. Parallel volume meshing using face removals and hierarchical repartitioning , 1999 .

[2] Nikos Chrisochoides,et al. Guaranteed: quality parallel delaunay refinement for restricted polyhedral domains , 2002, SCG '02.

[3] Paul-Louis George,et al. Delaunay triangulation and meshing : application to finite elements , 1998 .

[4] Keshav Pingali,et al. A load balancing framework for adaptive and asynchronous applications , 2004, IEEE Transactions on Parallel and Distributed Systems.

[5] Balaram Sinharoy,et al. IBM Power5 chip: a dual-core multithreaded processor , 2004, IEEE Micro.

[6] John Paul Shen,et al. Speculative Precomputation : Exploring the Use of Multithreading for Latency 1 Speculative Precomputation : Exploring the Use of Multithreading for Latency , 2002 .

[7] Christopher Hughes,et al. Speculative precomputation: long-range prefetching of delinquent loads , 2001, ISCA 2001.

[8] Guy E. Blelloch,et al. Developing a practical projection-based parallel Delaunay algorithm , 1996, SCG '96.

[9] Jonathan Richard Shewchuk,et al. Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator , 1996, WACG.

[10] Babak Falsafi,et al. Implicitly-multithreaded processors , 2003, ISCA '03.

[11] L. Paul Chew,et al. Constrained Delaunay triangulations , 1987, SCG '87.

[12] Brad Calder,et al. Threaded multiple path execution , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).

[13] L. Oliker,et al. Parallelization of a Dynamic Unstructured Application using Three Leading Paradigms , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[14] Roy Williams,et al. Adaptive Parallel Meshes with Complex Geometry , 1991 .

[15] Allan Porterfield,et al. The Tera computer system , 1990, ICS '90.

[16] Mark T. Jones,et al. Parallel algorithms for the adaptive refinement and partitioning of unstructured meshes , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.

[17] Maged M. Michael,et al. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms , 1996, PODC '96.

[18] J. Shewchuk,et al. Delaunay refinement mesh generation , 1997 .

[19] Andrey N. Chernikov,et al. Practical and efficient point insertion scheduling method for parallel guaranteed quality delaunay refinement , 2004, ICS '04.

[20] Haitham Akkary,et al. A dynamic multithreading processor , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[21] George Em Karniadakis,et al. Nodes, modes and flow codes , 1993 .

[22] Adrian Bowyer,et al. Computing Dirichlet Tessellations , 1981, Comput. J..

[23] Nigel P. Weatherill,et al. Distributed parallel Delaunay mesh generation , 1999 .

[24] Dean M. Tullsen,et al. Supporting fine-grained synchronization on a simultaneous multithreading processor , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[25] Microsystems Sun. UltraSPARC IV Processor Architecture Overview , 2004 .

[26] Ron Kikinis,et al. Real-Time Biomechanical Simulation of Volumetric Brain Deformation for Image Guided Neurosurgery , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[27] Daniel Pizarro-Perez,et al. Parallel Refinement of Tetrahedral Meshes Using Terminal-Edge Bisection Algorithm , 2004, IMR.

[28] Clemens Kadow. Adaptive Dynamic Projection-Based Partitioning for Parallel Delaunay Mesh Generation Algorithms , 2003 .

[29] Hans Werner Meuer,et al. Top500 Supercomputer Sites , 1997 .

[30] L. Paul Chew,et al. Parallel Constrained Delaunay Meshing , 2007 .

[31] D. F. Watson. Computing the n-Dimensional Delaunay Tesselation with Application to Voronoi Polytopes , 1981, Comput. J..

[32] Larry Carter,et al. Multi-processor Performance on the Tera MTA , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[33] Jerome Galtier,et al. Prepartitioning as a way to mesh subdomains in parallel , 1997 .

[34] D. Marr,et al. Hyper-Threading Technology Architecture and MIcroarchitecture , 2002 .

[35] Håkan Grahn,et al. SimICS/Sun4m: A Virtual Workstation , 1998, USENIX Annual Technical Conference.