Optimising the performance of the spectral/hp element method with collective linear algebra operations

As computing hardware evolves, increasing core counts mean that memory bandwidth is becoming the deciding factor in attaining peak performance of numerical methods. High-order finite element methods, such as those implemented in the spectral/hp framework Nektar++, are particularly well-suited to this environment. Unlike low-order methods that typically utilise sparse storage, matrices representing high-order operators have greater density and richer structure. In this paper, we show how these qualities can be exploited to increase runtime performance on nodes that comprise a typical high-performance computing system, by amalgamating the action of key operators on multiple elements into a single, memory-efficient block. We investigate different strategies for achieving optimal performance across a range of polynomial orders and element types. As these strategies all depend on external factors such as BLAS implementation and the geometry of interest, we present a technique for automatically selecting the most efficient strategy at runtime.

[1]  Freddie D. Witherden,et al.  PyFR: An open source framework for solving advection-diffusion type problems on streaming architectures using the flux reconstruction approach , 2013, Comput. Phys. Commun..

[2]  M. G. Duffy,et al.  Quadrature Over a Pyramid or Cube of Integrands with a Singularity at a Vertex , 1982 .

[3]  Robert Michael Kirby,et al.  To CG or to HDG: A Comparative Study , 2012, J. Sci. Comput..

[4]  Marco Lúcio Bittencourt,et al.  SPECTRAL/HP FINITE ELEMENTS APPLIED TO LINEAR AND NON-LINEAR STRUCTURAL ELASTIC PROBLEMS , 2007 .

[5]  David Moxey,et al.  23rd International Meshing Roundtable (IMR23) A thermo-elastic analogy for high-order curvilinear meshing with control of mesh validity and quality , 2014 .

[6]  Spencer J. Sherwin,et al.  Connections between the discontinuous Galerkin method and high‐order flux reconstruction schemes , 2014 .

[7]  M Nowak,et al.  A combined numerical and experimental framework for determining permeability properties of the arterial media , 2015, Biomechanics and modeling in mechanobiology.

[8]  James R. Stewart,et al.  The SIERRA Framework for Developing Advanced Parallel Mechanics Applications , 2003 .

[9]  Robert Michael Kirby,et al.  From h to p efficiently: Implementing finite and spectral/hp element methods to achieve optimal performance for low- and high-order discretisations , 2010, J. Comput. Phys..

[10]  Paul H. J. Kelly,et al.  GiMMiK - Generating bespoke matrix multiplication kernels for accelerators: Application to high-order Computational Fluid Dynamics , 2016, Comput. Phys. Commun..

[11]  Roger P. Pawlowski,et al.  Automating embedded analysis capabilities and managing software complexity in multiphysics simulation, Part II: Application to partial differential equations , 2012, Sci. Program..

[12]  Spencer J. Sherwin,et al.  A triangular spectral/hp discontinuous Galerkin method for modelling 2D shallow water equations , 2004 .

[13]  Viktoria Schmitt,et al.  Pressure distributions on the ONERA M6 wing at transonic Mach numbers , 1979 .

[14]  Robert Michael Kirby,et al.  Exploiting Batch Processing on Streaming Architectures to Solve 2D Elliptic Finite Element Problems: A Hybridized Discontinuous Galerkin (HDG) Case Study , 2013, Journal of Scientific Computing.

[15]  Spencer J. Sherwin,et al.  From h to p Efficiently: Selecting the Optimal Spectral/hp Discretisation in Three Dimensions , 2011 .

[16]  Robert Michael Kirby,et al.  Nektar++: An open-source spectral/hp element framework , 2015, Comput. Phys. Commun..

[17]  Samuel Williams,et al.  Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[18]  S. Orszag Spectral methods for problems in complex geometries , 1980 .

[19]  S. Sherwin,et al.  From h to p efficiently: Strategy selection for operator evaluation on hexahedral and tetrahedral elements , 2011 .

[20]  Spencer J. Sherwin,et al.  Implicit Large-Eddy Simulation of a Wingtip Vortex , 2016 .

[21]  G. Karniadakis,et al.  Spectral/hp Element Methods for Computational Fluid Dynamics , 2005 .

[22]  Chun Chen,et al.  Speeding up Nek5000 with autotuning and specialization , 2010, ICS '10.

[23]  Robert Michael Kirby,et al.  High-order spectral/hp element discretisation for reaction–diffusion problems on surfaces: Application to cardiac electrophysiology , 2014, J. Comput. Phys..

[24]  F. G. Gustavson,et al.  High-performance linear algebra algorithms using new generalized data structures for matrices , 2003, IBM J. Res. Dev..