Numerical integration on GPUs for higher order finite elements

The paper considers the problem of implementation on graphics processors of numerical integration routines for higher order finite element approximations. The design of suitable GPU kernels is investigated in the context of general purpose integration procedures, as well as particular example applications. The most important characteristic of the problem investigated is the large variation of required processor and memory resources associated with different degrees of approximating polynomials. The questions that we try to answer are whether it is possible to design a single integration kernel for different GPUs and different orders of approximation and what performance can be expected in such a case.

[1]  James Demmel,et al.  Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[2]  Sean Rul,et al.  An experimental study on performance portability of OpenCL kernels , 2010, HiPC 2010.

[3]  Robert Michael Kirby,et al.  From h to p efficiently: Implementing finite and spectral/hp element methods to achieve optimal performance for low- and high-order discretisations , 2010, J. Comput. Phys..

[4]  B. Rivière,et al.  Part II. Discontinuous Galerkin method applied to a single phase flow in porous media , 2000 .

[5]  Pheng-Ann Heng,et al.  A hybrid condensed finite element model with GPU acceleration for interactive 3D soft tissue cutting , 2004, Comput. Animat. Virtual Worlds.

[6]  Raytcho D. Lazarov,et al.  Higher-order finite element methods , 2005, Math. Comput..

[7]  Ralph Müller,et al.  A scalable multi‐level preconditioner for matrix‐free µ‐finite element analysis of human bone structures , 2008 .

[8]  Krzysztof Banas Parallelization of Large Scale Adaptive Finite Element Computations , 2003, PPAM.

[9]  K. Banas,et al.  A Newton–Krylov solver with multiplicative Schwarz preconditioning for finite element compressible flow simulations , 2002 .

[10]  William Gropp,et al.  Parallel Newton-Krylov-Schwarz Algorithms for the Transonic Full Potential Equation , 1996, SIAM J. Sci. Comput..

[11]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[12]  Pheng-Ann Heng,et al.  A hybrid condensed finite element model with GPU acceleration for interactive 3D soft tissue cutting: Research Articles , 2004 .

[13]  Matthew G. Knepley,et al.  Finite Element Integration on GPUs , 2013, TOMS.

[14]  Anders Logg,et al.  Automated Solution of Differential Equations by the Finite Element Method: The FEniCS Book , 2012 .

[15]  Krzysztof Banas,et al.  3D finite element numerical integration on GPUs , 2010, ICCS.

[16]  Krzysztof Banas,et al.  Testing Tesla architecture for scientific computing: The performance of matrix-vector product , 2008, 2008 International Multiconference on Computer Science and Information Technology.

[17]  Krzysztof Banas,et al.  Finite Element Numerical Integration on GPUs , 2009, PPAM.

[18]  Jens H. Krüger,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.

[19]  Julien Langou,et al.  Accelerating scientific computations with mixed precision algorithms , 2008, Comput. Phys. Commun..

[20]  Thomas J. R. Hughes,et al.  A globally convergent matrix-free algorithm for implicit time-marching schemes arising in finite element analysis in fluids , 1991 .

[21]  Filip Kruel,et al.  Vectorized OpenCL implementation of numerical integration for higher order finite elements , 2013 .

[22]  Sherwin,et al.  Tetrahedral hp Finite Elements : Algorithms and Flow Simulations , 1996 .

[23]  Timothy C. Warburton,et al.  Nodal discontinuous Galerkin methods on graphics processors , 2009, J. Comput. Phys..

[24]  L. Demkowicz,et al.  Entropy Controlled Adaptive Finite Element Simulations for Compressible Gas Flow , 1996 .

[25]  Tom R. Halfhill NVIDIA's Next-Generation CUDA Compute and Graphics Architecture, Code-Named Fermi, Adds Muscle for Parallel Processing , 2009 .

[26]  Maciej Paszyński,et al.  Architecture of iterative solvers for hp-adaptive finite element codes , 2013 .

[27]  Robert Strzodka,et al.  Exploring weak scalability for FEM calculations on a GPU-enhanced cluster , 2007, Parallel Comput..

[28]  Thomas J. R. Hughes,et al.  Isogeometric Analysis: Toward Integration of CAD and FEA , 2009 .

[29]  Eric Darve,et al.  Assembly of finite element methods on graphics processors , 2011 .

[30]  Krzysztof Banaś,et al.  Modeling of Inconel 625 TIG welding process , 2013 .

[31]  David Kaeli,et al.  Heterogeneous Computing with OpenCL , 2011 .

[32]  N. Fujimoto,et al.  Faster matrix-vector multiplication on GeForce 8800GTX , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[33]  David Pardo,et al.  Out-of-core multi-frontal solver for multi-physics hp adaptive problems , 2011, ICCS.

[34]  Robert Strzodka,et al.  Scientific computation for simulations on programmable graphics hardware , 2005, Simul. Model. Pract. Theory.

[35]  Gordon Erlebacher,et al.  High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster , 2010, J. Comput. Phys..

[36]  George Em Karniadakis,et al.  TetrahedralhpFinite Elements , 1996 .

[37]  John K. Reid,et al.  The Multifrontal Solution of Indefinite Sparse Symmetric Linear , 1983, TOMS.

[38]  Michal Mrozowski,et al.  Generation of large finite-element matrices on multiple graphics processors , 2013 .

[39]  Krzysztof Banas,et al.  Finite Element Numerical Integration on PowerXCell Processors , 2009, PPAM.

[40]  Krzysztof Banas,et al.  Design and development of an adaptive mesh manipulation module for detailed FEM simulation of flows , 2010, ICCS.

[41]  Igor Peterlik,et al.  GPU Acceleration of Equations Assembly in Finite Elements Method -- Preliminary Results , 2009 .

[42]  Anders Logg,et al.  DOLFIN: Automated finite element computing , 2010, TOMS.

[43]  Andreas Moshovos,et al.  Demystifying GPU microarchitecture through microbenchmarking , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[44]  Michal Mrozowski,et al.  FINITE ELEMENT MATRIX GENERATION ON A GPU , 2012 .

[45]  David A. Ham,et al.  Finite element assembly strategies on multi‐core and many‐core architectures , 2013 .

[46]  Aaftab Munshi,et al.  The OpenCL specification , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).

[47]  Krzysztof Banaś A Model for Parallel Adaptive Finite Element Software , 2005 .

[48]  Maciej Paszyński,et al.  Computing with hp-ADAPTIVE FINITE ELEMENTS: Volume II Frontiers: Three Dimensional Elliptic and Maxwell Problems with Applications , 2007 .

[49]  Krzysztof Banas A Modular Design for Parallel Adaptive Finite Element Computational Kernels , 2004, International Conference on Computational Science.

[50]  Matthew G. Knepley,et al.  Optimizing the Evaluation of Finite Element Matrices , 2005, SIAM J. Sci. Comput..

[51]  Krzysztof Banas,et al.  Higher order FEM numerical integration on GPUs with OpenCL , 2010, Proceedings of the International Multiconference on Computer Science and Information Technology.

[52]  Jens Markus Melenk,et al.  Fully discrete hp-finite elements: fast quadrature , 2001 .

[53]  Robert Strzodka,et al.  Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations , 2007, Int. J. Parallel Emergent Distributed Syst..

[54]  Krzysztof Banaś,et al.  ModFem : a computational framework for parallel adaptive finite element simulations , 2013 .

[55]  K. Banas Agent Architecture for Mesh Based Simulation Systems , 2006, International Conference on Computational Science.

[56]  Krzysztof Banaś,et al.  On a modular architecture for finite element systems. I. Sequential codes , 2005 .

[57]  Bruce M. Irons,et al.  A frontal solution program for finite element analysis , 1970 .

[58]  Martin Rumpf,et al.  Graphics Processor Units: New Prospects for Parallel Computing , 2006 .

[59]  Anders Logg,et al.  A compiler for variational forms , 2006, TOMS.

[60]  Michael B. Giles,et al.  Finite Element Algorithms and Data Structures on Graphical Processing Units , 2015, International Journal of Parallel Programming.

[61]  Jack Dongarra,et al.  Computational Science - ICCS 2006, 6th International Conference, Reading, UK, May 28-31, 2006, Proceedings, Part II , 2006, International Conference on Computational Science.

[62]  Barbara Chapman,et al.  Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation) , 2007 .

[63]  Krzysztof Banas,et al.  Performance Analysis of Iterative Solvers of Linear Equations for Hp-adaptive Finite Element Method , 2013, ICCS.

[64]  Robert Strzodka,et al.  Co-processor acceleration of an unmodified parallel solid mechanics code with FEASTGPU , 2009, Int. J. Comput. Sci. Eng..

[65]  Moshe Dubiner Spectral methods on triangles and other domains , 1991 .

[66]  Erik Lindholm,et al.  NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[67]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[68]  Philippe G. Ciarlet,et al.  The finite element method for elliptic problems , 2002, Classics in applied mathematics.

[69]  David A. Ham,et al.  Towards generating optimised finite element solvers for GPUs from high-level specifications , 2010, ICCS.

[70]  Gordon Erlebacher,et al.  Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA , 2009, J. Parallel Distributed Comput..

[71]  Alexander Düster,et al.  Book Review: Leszek Demkowicz, Computing with hp‐adaptive finite elements, Volume 1, One and two dimensional elliptic and Maxwell problems , 2007 .

[72]  Eric Darve,et al.  Application of Assembly of Finite Element Methods on Graphics Processors for Real-Time Elastodynamics , 2011 .

[73]  I. Doležel,et al.  Higher-Order Finite Element Methods , 2003 .