GPU performance analysis of a nodal discontinuous Galerkin method for acoustic and elastic models

Finite element schemes based on discontinuous Galerkin methods possess features amenable to massively parallel computing accelerated with general purpose graphics processing units (GPUs). However, the computational performance of such schemes strongly depends on their implementation. In the past, several implementation strategies have been proposed. They are based exclusively on specialized compute kernels tuned for each operation, or they can leverage BLAS libraries that provide optimized routines for basic linear algebra operations. In this paper, we present and analyze up-to-date performance results for different implementations, tested in a unified framework on a single NVIDIA GTX980 GPU. We show that specialized kernels written with a one-node-per-thread strategy are competitive for polynomial bases up to the fifth and seventh degrees for acoustic and elastic models, respectively. For higher degrees, a strategy that makes use of the NVIDIA cuBLAS library provides better results, able to reach a net arithmetic throughput 35.7% of the theoretical peak value. HighlightsSeveral GPU implementations for time-domain wave simulations are compared.The numerical schemes are based on a high-order discontinuous finite element method.The implementations are profiled using the roofline model to highlight bottlenecks.The best implementation depends on the polynomial degree of the basis functions.

[1]  Henri Calandra,et al.  Numerical performances of a hybrid local‐time stepping strategy applied to the reverse time migration , 2011 .

[2]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[3]  Georg Stadler,et al.  A high-order discontinuous Galerkin method for wave propagation through coupled elastic-acoustic media , 2010, J. Comput. Phys..

[4]  D. Komatitsch,et al.  The spectral element method: An efficient tool to simulate the seismic response of 2D and 3D geological structures , 1998, Bulletin of the Seismological Society of America.

[5]  Jean E. Roberts,et al.  Higher Order Triangular Finite Elements with Mass Lumping for the Wave Equation , 2000, SIAM J. Numer. Anal..

[6]  Henri Calandra,et al.  Fast seismic modeling and reverse time migration on a graphics processing unit cluster , 2012, Concurr. Comput. Pract. Exp..

[7]  Martin Fuhry,et al.  Discontinuous Galerkin methods on graphics processing units for nonlinear hyperbolic conservation laws , 2016, ArXiv.

[8]  Michael Dumbser,et al.  An arbitrary high-order discontinuous Galerkin method for elastic waves on unstructured meshes - IV. Anisotropy , 2007 .

[9]  Tim Warburton,et al.  An explicit construction of interpolation nodes on the simplex , 2007 .

[10]  Axel Modave,et al.  GPU-accelerated discontinuous Galerkin methods on hybrid meshes , 2015, J. Comput. Phys..

[11]  George Em Karniadakis,et al.  A discontinuous Galerkin spectral/ hp grids , 2000 .

[12]  J. Virieux,et al.  An hp-adaptive discontinuous Galerkin finite-element method for 3-D elastic wave modelling , 2010 .

[13]  Robert Michael Kirby,et al.  From h to p efficiently: Implementing finite and spectral/hp element methods to achieve optimal performance for low- and high-order discretisations , 2010, J. Comput. Phys..

[14]  Axel Modave,et al.  Accelerated discontinuous Galerkin time-domain simulations for seismic wave propagation , 2015 .

[15]  R Gandham,et al.  GPU Accelerated Discontinuous Galerkin Methods for Shallow Water Equations , 2014, 1403.1661.

[16]  Jesse Chan,et al.  GPU-Accelerated Bernstein-Bézier Discontinuous Galerkin Methods for Wave Problems , 2015, SIAM J. Sci. Comput..

[17]  D. Komatitsch,et al.  Introduction to the spectral element method for three-dimensional seismic wave propagation , 1999 .

[18]  Freddie D. Witherden,et al.  PyFR: An open source framework for solving advection-diffusion type problems on streaming architectures using the flux reconstruction approach , 2013, Comput. Phys. Commun..

[19]  Markus Clemens,et al.  GPU Accelerated Adams–Bashforth Multirate Discontinuous Galerkin FEM Simulation of High-Frequency Electromagnetic Fields , 2010, IEEE Transactions on Magnetics.

[20]  Jean-François Remacle,et al.  A quadrature-free discontinuous Galerkin method for the level set equation , 2006, J. Comput. Phys..

[21]  Freddie D. Witherden,et al.  Heterogeneous Computing on Mixed Unstructured Grids with PyFR , 2014, ArXiv.

[22]  Michael Dumbser,et al.  A p-Adaptive Discontinuous Galerkin Method with Local Time Steps for Computational Seismology , 2009 .

[23]  Gordon Erlebacher,et al.  High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster , 2010, J. Comput. Phys..

[24]  Curtis C. Ober,et al.  Full Wave Inversion Using a Spectral-Element Discontinuous Galerkin Method , 2014 .

[25]  Timothy C. Warburton,et al.  OCCA: A unified approach to multi-threading languages , 2014, ArXiv.

[26]  Curtis C. Ober,et al.  Unstructured discontinuous Galerkin for seismic inversion. , 2010 .

[27]  Albert Farrés,et al.  Finite-difference staggered grids in GPUs for anisotropic elastic wave propagation simulation , 2014, Comput. Geosci..

[28]  Po Chen,et al.  Accelerating the discontinuous Galerkin method for seismic wave propagation simulations using multiple GPUs with CUDA and MPI , 2013 .

[29]  Henri Calandra,et al.  A review of the spectral, pseudo‐spectral, finite‐difference and finite‐element modelling techniques for geophysical imaging , 2011 .

[30]  M. Dumbser,et al.  An arbitrary high-order discontinuous Galerkin method for elastic waves on unstructured meshes — II. The three-dimensional isotropic case , 2006 .

[31]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[32]  E. Diego Mercerat,et al.  A nodal high-order discontinuous Galerkin method for elastic wave propagation in arbitrary heterogeneous media , 2013 .

[33]  Robin M. Weiss,et al.  Solving 3D anisotropic elastic wave equations on parallel GPU devices , 2013 .

[34]  Gordon Erlebacher,et al.  Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA , 2009, J. Parallel Distributed Comput..

[35]  Dimitri Komatitsch,et al.  Accelerating a three-dimensional finite-difference wave propagation code using GPU graphics cards , 2010 .

[36]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[37]  William W. Symes,et al.  Interface error analysis for numerical wave propagation , 2009 .

[38]  James F. Doyle,et al.  The Spectral Element Method , 2020, Wave Propagation in Structures.

[39]  Max Grossman,et al.  Professional CUDA C Programming , 2014 .

[40]  J. Hesthaven,et al.  Nodal high-order methods on unstructured grids , 2002 .

[41]  Wim A. Mulder,et al.  Local time stepping with the discontinuous Galerkin method for wave propagation in 3D heterogeneous media , 2013 .

[42]  Jean-François Remacle,et al.  Efficient Discontinuous Galerkin Methods for solving acoustic problems , 2005 .

[43]  W. A. Mulder,et al.  Higher-order triangular and tetrahedral finite elements with mass lumping for solving the wave equation , 1999 .

[44]  Timothy C. Warburton,et al.  Nodal discontinuous Galerkin methods on graphics processors , 2009, J. Comput. Phys..

[45]  Axel Modave,et al.  A nodal discontinuous Galerkin method for reverse-time migration on GPU clusters , 2015, 1506.00907.

[46]  Alice-Agnes Gabriel,et al.  Sustained Petascale Performance of Seismic Simulations with SeisSol on SuperMUC , 2014, ISC.

[47]  Pradeep Dubey,et al.  Petascale High Order Dynamic Rupture Earthquake Simulations on Heterogeneous Supercomputers , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[48]  Samuel Williams,et al.  Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.