libCEED: Fast algebra for high-order element-based discretizations

Finite element methods are widely used to solve partial differential equations (PDE) in science and engineering, but their standard implementation (Arndt et al., 2020; Kirk et al., 2006; Logg et al., 2012) relies on assembling sparse matrices. Sparse matrix multiplication and triangular operations perform a scalar multiply and add for each nonzero entry, just 2 floating point operations (flops) per scalar that must be loaded from memory (Williams et al., 2009). Modern hardware is capable of nearly 100 flops per scalar streamed from memory (Rupp, 2020) so sparse matrix operations cannot achieve more than about 2% utilization of arithmetic units. Matrix assembly becomes even more problematic when the polynomial degree p of the basis functions is increased, resulting in O(p) storage and O(p) compute per degree of freedom (DoF) in d dimensions. Methods pioneered by the spectral element community (Deville et al., 2002; Orszag, 1980) exploit problem structure to reduce costs to O(1) storage and O(p) compute per DoF, with very high utilization of modern CPUs and GPUs. Unfortunately, highquality implementations have been relegated to applications and intrusive frameworks that are often difficult to extend to new problems or incorporate into legacy applications, especially when strong preconditioners are required.

[1]  K. Jarrod Millman,et al.  Array programming with NumPy , 2020, Nat..

[2]  Jed Brown,et al.  High-performance operator evaluations with ease of use: libCEED's Python interface , 2020, SciPy.

[3]  P. Fischer,et al.  High-Order Methods for Incompressible Fluid Flow , 2002 .

[4]  Benjamin S. Kirk,et al.  Library for Parallel Adaptive Mesh Refinement / Coarsening Simulations , 2006 .

[5]  Anders Logg,et al.  Automated Solution of Differential Equations by the Finite Element Method: The FEniCS Book , 2012 .

[6]  Jed Brown,et al.  Efficient Nonlinear Solvers for Nodal High-Order Finite Elements in 3D , 2010, J. Sci. Comput..

[7]  Louis J. Wicker,et al.  Numerical solutions of a non‐linear density current: A benchmark solution and comparisons , 1993 .

[8]  David Wells,et al.  The deal.II library, Version 9.1 , 2019, J. Num. Math..

[9]  Paul F. Fischer,et al.  Hybrid Multigrid/Schwarz Algorithms for the Spectral Element Method , 2005, J. Sci. Comput..

[10]  Stefano Zampini,et al.  The PetscSF Scalable Communication Layer , 2022, IEEE Transactions on Parallel and Distributed Systems.

[11]  Siu Kwan Lam,et al.  Numba: a LLVM-based Python JIT compiler , 2015, LLVM '15.

[12]  Alan Edelman,et al.  Julia: A Fresh Approach to Numerical Computing , 2014, SIAM Rev..

[13]  Samuel Williams,et al.  Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[14]  Stefano Zampini,et al.  MFEM: a modular finite element methods library , 2019, 1911.09220.

[15]  Luca Heltai,et al.  The deal.II library, Version 9.2 , 2020, J. Num. Math..

[16]  Ivo Babuska,et al.  The p and h-p Versions of the Finite Element Method, Basic Principles and Properties , 1994, SIAM Rev..

[17]  Christina Freytag,et al.  Using Mpi Portable Parallel Programming With The Message Passing Interface , 2016 .

[18]  Jack Dongarra,et al.  Efficient exascale discretizations: High-order finite element methods , 2021, Int. J. High Perform. Comput. Appl..

[19]  S. Orszag Spectral methods for problems in complex geometries , 1980 .

[20]  Martin Kronbichler,et al.  Scalability of high-performance PDE solvers , 2020, Int. J. High Perform. Comput. Appl..