Finite element assembly strategies on multi‐core and many‐core architectures

We demonstrate that radically differing implementations of finite element methods are needed on multicore (CPU) and many-core (GPU) architectures, if their respective performance potential is to be realised. Our experimental investigations using a finite element advection-diffusion solver show that increased performance on each architecture can only be achieved by committing to specific and diverse algorithmic choices that cut across the high-level structure of the implementation. Making these commitments to achieve high performance for a single architecture leads to a loss of performance portability. Data structures that include redundant data but enable coalesced memory accesses are faster on many-core architectures, whereas redundancy-free data structures that are accessed indirectly are faster on multi-core architectures. The Addto algorithm for global assembly is optimal on multi-core architectures, whereas the Local Matrix Approach is optimal on many-core architectures despite requiring more computation than the Addto algorithm. These results demonstrate the value in making the correct choice of algorithm and data structure when implementing finite element methods, spectral element methods and low-order discontinuous Galerkin methods on modern high-performance architectures. Copyright c © 2011 John Wiley & Sons, Ltd.

[1]  Eric Darve,et al.  Assembly of finite element methods on graphics processors , 2011 .

[2]  Anders Logg,et al.  A compiler for variational forms , 2006, TOMS.

[3]  Graham Markall Accelerating Unstructured Mesh Computational Fluid Dynamics on the NVidia Tesla GPU Architecture , 2011 .

[4]  Hiroshi Okuda,et al.  Conjugate gradients on multiple GPUs , 2010 .

[5]  David A. Ham,et al.  Towards generating optimised finite element solvers for GPUs from high-level specifications , 2010, ICCS.

[6]  Krzysztof Banas,et al.  3D finite element numerical integration on GPUs , 2010, ICCS.

[7]  G. Karniadakis,et al.  Spectral/hp Element Methods for CFD , 1999 .

[8]  Gordon Erlebacher,et al.  Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA , 2009, J. Parallel Distributed Comput..

[9]  Rainald Löhner,et al.  Running unstructured grid‐based CFD solvers on modern graphics hardware , 2011 .

[10]  Robert Michael Kirby,et al.  From h to p efficiently: Implementing finite and spectral/hp element methods to achieve optimal performance for low- and high-order discretisations , 2010, J. Comput. Phys..

[11]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools (2nd Edition) , 2006 .

[12]  J. Remacle,et al.  Gmsh: A 3‐D finite element mesh generator with built‐in pre‐ and post‐processing facilities , 2009 .

[13]  Michael B. Giles,et al.  A framework for parallel unstructured grid applications on GPUs , 2011 .

[14]  Spencer J. Sherwin,et al.  From h to p Efficiently: Selecting the Optimal Spectral/hp Discretisation in Three Dimensions , 2011 .

[15]  I. Doležel,et al.  Higher-Order Finite Element Methods , 2003 .

[16]  Richard Barrett,et al.  Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods , 1994, Other Titles in Applied Mathematics.

[17]  Kyriakos C. Giannakoglou,et al.  Unsteady CFD computations using vertex‐centered finite volumes for unstructured grids on Graphics Processing Units , 2011 .

[18]  S. Sherwin,et al.  From h to p efficiently: Strategy selection for operator evaluation on hexahedral and tetrahedral elements , 2011 .

[19]  A. Logg Automating the Finite Element Method , 2007, 1112.0433.

[20]  J. Hesthaven,et al.  Nodal Discontinuous Galerkin Methods: Algorithms, Analysis, and Applications , 2007 .

[21]  Igor Peterlik,et al.  GPU Acceleration of Equations Assembly in Finite Elements Method -- Preliminary Results , 2009 .

[22]  Christophe Geuzaine,et al.  Gmsh: A 3‐D finite element mesh generator with built‐in pre‐ and post‐processing facilities , 2009 .

[23]  Gordon Erlebacher,et al.  Modeling the propagation of elastic waves using spectral elements on a cluster of 192 GPUs , 2010, Computer Science - Research and Development.