Hyperbolic Diffusion in Flux Reconstruction: Optimisation through Kernel Fusion within Tensor-Product Elements

Novel methods are presented in this initial study for the fusion of GPU kernels in the artificial compressibility method (ACM), using tensor product elements with constant Jacobians and flux reconstruction. This is made possible through the hyperbolisation of the diffusion terms, which eliminates the expensive algorithmic steps needed to form the viscous stresses. Two fusion approaches are presented, which offer differing levels of parallelism. This is found to be necessary for the change in workload as the order of accuracy of the elements is increased. Several further optimisations of these approaches are demonstrated, including a generation time memory manager which maximises resource usage. The fused kernels are able to achieve 3-4 times speedup, which compares favourably with a theoretical maximum speedup of 4. In three dimensional test cases, the generated fused kernels are found to reduce total runtime by ${\sim}25\%$, and, when compared to the standard ACM formulation, simulations demonstrate that a speedup of $2.3$ times can be achieved.

[1]  Freddie D. Witherden,et al.  PyFR: An open source framework for solving advection-diffusion type problems on streaming architectures using the flux reconstruction approach , 2013, Comput. Phys. Commun..

[2]  Stefano Zampini,et al.  MFEM: a modular finite element methods library , 2019, 1911.09220.

[3]  H. T. Huynh,et al.  A Flux Reconstruction Approach to High-Order Schemes Including Discontinuous Galerkin Methods , 2007 .

[4]  Meilin Yu,et al.  Comparison of ROW, ESDIRK, and BDF2 for Unsteady Flows with the High-Order Flux Reconstruction Formulation , 2020, J. Sci. Comput..

[5]  Hiroaki Nishikawa,et al.  A first-order system approach for diffusion equation. II: Unification of advection and diffusion , 2010, J. Comput. Phys..

[6]  Spencer J. Sherwin,et al.  Spatial eigensolution analysis of discontinuous Galerkin schemes with practical insights for under-resolved computations and implicit LES , 2017, Computers & Fluids.

[7]  Freddie D. Witherden,et al.  A high-order cross-platform incompressible Navier-Stokes solver via artificial compressibility with application to a turbulent jet , 2018, Comput. Phys. Commun..

[8]  Paul H. J. Kelly,et al.  GiMMiK - Generating bespoke matrix multiplication kernels for accelerators: Application to high-order Computational Fluid Dynamics , 2016, Comput. Phys. Commun..

[9]  G. Taylor,et al.  Mechanism of the production of small eddies from large ones , 1937 .

[10]  Will Trojak,et al.  High-Order Flux Reconstruction on Stretched and Warped Meshes , 2017, AIAA Journal.

[11]  D. Spalding,et al.  A calculation procedure for heat, mass and momentum transfer in three-dimensional parabolic flows , 1972 .

[12]  Hong Luo,et al.  A New Formulation of Hyperbolic Navier-Stokes Solver based on Finite Volume Method on Arbitrary Grids , 2018, 2018 Fluid Dynamics Conference.

[13]  S. Orszag,et al.  Small-scale structure of the Taylor–Green vortex , 1983, Journal of Fluid Mechanics.

[14]  Antony Jameson,et al.  A New Class of High-Order Energy Stable Flux Reconstruction Schemes , 2011, J. Sci. Comput..

[15]  P. Tucker,et al.  A Simple Flux Reconstruction Approach to Solving a Poisson Equation to Find Wall Distances for Turbulence Modelling , 2018, 2018 Fluid Dynamics Conference.

[16]  A. Jameson Time dependent calculations using multigrid, with applications to unsteady flows past airfoils and wings , 1991 .

[17]  Hiroaki Nishikawa,et al.  Beyond Interface Gradient: A General Principle for Constructing Diffusion Schemes , 2010 .

[18]  Freddie D. Witherden,et al.  Locally adaptive pseudo-time stepping for high-order Flux Reconstruction , 2019, J. Comput. Phys..

[19]  U. Piomelli Wall-layer models for large-eddy simulations , 2008 .

[20]  Paul G. Tucker,et al.  Advanced Computational Fluid and Aerodynamics , 2016 .

[21]  Marco Maggioni,et al.  Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking , 2018, ArXiv.

[22]  Petros Koumoutsakos,et al.  A comparison of vortex and pseudo-spectral methods for the simulation of periodic vortical flows at high Reynolds numbers , 2011, J. Comput. Phys..

[23]  Alex Townsend,et al.  Fast Poisson solvers for spectral methods , 2017, IMA Journal of Numerical Analysis.

[24]  Will Trojak,et al.  Inline Vector Compression for Computational Physics , 2021, Comput. Phys. Commun..

[25]  Timothy C. Warburton,et al.  Acceleration of tensor-product operations for high-order finite element methods , 2017, Int. J. High Perform. Comput. Appl..

[26]  Chao Yan,et al.  Effective high-order energy stable flux reconstruction methods for first-order hyperbolic linear and nonlinear systems , 2020, J. Comput. Phys..

[27]  Will Trojak,et al.  On Fourier analysis of polynomial multigrid for arbitrary multi-stage cycles , 2020, ArXiv.

[28]  Luca Heltai,et al.  The deal.II finite element library: Design, features, and insights , 2021, Comput. Math. Appl..

[29]  Hong Luo,et al.  Reconstructed Discontinuous Galerkin Methods Based on First-Order Hyperbolic System for Advection-Diffusion Equations , 2017 .

[30]  Lloyd N. Trefethen,et al.  Multivariate polynomial approximation in the hypercube , 2016, 1608.02216.

[31]  A. Chorin A Numerical Method for Solving Incompressible Viscous Flow Problems , 1997 .