Stencil Scaling for Vector-Valued PDEs on Hybrid Grids With Applications to Generalized Newtonian Fluids

Matrix-free finite element implementations for large applications provide an attractive alternative to standard sparse matrix data formats due to the significantly reduced memory consumption. Here, we show that they are also competitive with respect to the run time in the low order case if combined with suitable stencil scaling techniques. We focus on variable coefficient vector-valued partial differential equations as they arise in many physical applications. The presented method is based on scaling constant reference stencils originating from a linear finite element discretization instead of evaluating the bilinear forms on-the-fly. This method assumes the usage of hierarchical hybrid grids, and it may be applied to vector-valued second-order elliptic partial differential equations directly or as a part of more complicated problems. We provide theoretical and experimental performance estimates showing the advantages of this new approach compared to the traditional on-the-fly integration and stored matrix approaches. In our numerical experiments, we consider two specific mathematical models. Namely, linear elastostatics and incompressible Stokes flow. The final example considers a non-linear shear-thinning generalized Newtonian fluid. For this type of non-linearity, we present an efficient approach to compute a regularized strain rate which is then used to define the node-wise viscosity. Depending on the compute architecture, we could observe maximum speedups of 64% and 122% compared to the on-the-fly integration. The largest considered example involved solving a Stokes problem with 12288 compute cores on the state of the art supercomputer SuperMUC-NG.

[1]  Barbara I. Wohlmuth,et al.  Mass-corrections for the conservative coupling of flow and transport on collocated meshes , 2016, J. Comput. Phys..

[2]  Frederico Pratas,et al.  Cache-aware Roofline model: Upgrading the loft , 2014, IEEE Computer Architecture Letters.

[3]  John Loffeld,et al.  On the arithmetic intensity of high-order finite-volume discretizations for hyperbolic systems of conservation laws , 2019, Int. J. High Perform. Comput. Appl..

[4]  Y. Ricard,et al.  Physics of Mantle Convection , 2007 .

[5]  Katharina Kormann,et al.  A generic interface for parallel cell-based finite element operator application , 2012 .

[6]  R. Dembo,et al.  INEXACT NEWTON METHODS , 1982 .

[7]  F. Brezzi,et al.  On the Stabilization of Finite Element Approximations of the Stokes Equations , 1984 .

[8]  Benjamin Karl Bergen,et al.  Hierarchical hybrid grids: data structures and core algorithms for multigrid , 2004, Numer. Linear Algebra Appl..

[9]  M. Ashby,et al.  Cellular solids: Structure & properties , 1988 .

[10]  Jed Brown,et al.  Efficient Nonlinear Solvers for Nodal High-Order Finite Elements in 3D , 2010, J. Sci. Comput..

[11]  Cyril Flaig,et al.  A Highly Scalable Matrix-Free Multigrid Solver for μFE Analysis Based on a Pointer-Less Octree , 2011, LSSC.

[12]  Karl Ljungkvist,et al.  Matrix-free finite-element computations on graphics processors with adaptively refined unstructured meshes , 2017, SpringSim.

[13]  Gerhard Wellein,et al.  Quantifying Performance Bottlenecks of Stencil Computations Using the Execution-Cache-Memory Model , 2014, ICS.

[14]  Ulrich Rüde,et al.  The HyTeG finite-element software framework for scalable multigrid solvers , 2018, Int. J. Parallel Emergent Distributed Syst..

[15]  J. Bey,et al.  Tetrahedral grid refinement , 1995, Computing.

[16]  Ulrich Rüde,et al.  Hierarchical hybrid grids: achieving TERAFLOP performance on large scale finite element simulations , 2007, Int. J. Parallel Emergent Distributed Syst..

[17]  O. Ghattas,et al.  Parallel Octree-Based Finite Element Method for Large-Scale Earthquake Ground Motion Simulation , 2005 .

[18]  Samuel Williams,et al.  Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[19]  Barbara I. Wohlmuth,et al.  A stencil scaling approach for accelerating matrix-free finite element implementations , 2017, SIAM J. Sci. Comput..

[20]  Gerhard Wellein,et al.  Introduction to High Performance Computing for Scientists and Engineers , 2010, Chapman and Hall / CRC computational science series.

[21]  Georg Stadler,et al.  Weighted BFBT Preconditioner for Stokes Flow Problems with Highly Heterogeneous Viscosity , 2016, SIAM J. Sci. Comput..

[22]  G. Carey,et al.  Element‐by‐element linear and nonlinear solution schemes , 1986 .

[23]  Martin Kronbichler,et al.  Multigrid for matrix-free finite element computations on graphics processors , 2017 .

[24]  Gerhard Wellein,et al.  Understanding HPC Benchmark Performance on Intel Broadwell and Cascade Lake Processors , 2020, ISC.

[25]  Ulrich Rüde,et al.  Highly Parallel Geometric Multigrid Algorithm for Hierarchical Hybrid Grids , 2011 .