Finite Element Integration on GPUs

We present a novel finite element integration method for low-order elements on GPUs. We achieve more than 100GF for element integration on first order discretizations of both the Laplacian and Elasticity operators on an NVIDIA GTX285, which has a nominal single precision peak flop rate of 1 TF/s and bandwidth of 159 GB/s, corresponding to a bandwidth limited peak of 40 GF/s.

[1]  David A. Ham,et al.  Towards generating optimised finite element solvers for GPUs from high-level specifications , 2010, ICCS.

[2]  Samuel Williams,et al.  Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[3]  Satoshi Matsuoka,et al.  A high-performance fault-tolerant software framework for memory on commodity GPUs , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[4]  P. Sadayappan,et al.  Optimal loop unrolling for GPGPU programs , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[5]  Eric Darve,et al.  Assembly of finite element methods on graphics processors , 2011 .

[6]  David A. Ham,et al.  Generating Optimised Finite Element Solvers for GPU Architectures , 2010 .

[7]  Anders Logg,et al.  A compiler for variational forms , 2006, TOMS.

[8]  Sébastien Ourselin,et al.  High-Speed Nonlinear Finite Element Analysis for Surgical Simulation Using Graphics Processing Units , 2008, IEEE Transactions on Medical Imaging.

[9]  Matthew G. Knepley,et al.  Optimizing the Evaluation of Finite Element Matrices , 2005, SIAM J. Sci. Comput..

[11]  Wen-mei W. Hwu,et al.  Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.

[12]  Nicolas Pinto,et al.  PyCUDA: GPU Run-Time Code Generation for High-Performance Computing , 2009, ArXiv.

[13]  Gordon Erlebacher,et al.  Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA , 2009, J. Parallel Distributed Comput..

[14]  Stefan Turek,et al.  FEAST—realization of hardware-oriented numerics for HPC simulations with finite elements , 2010, ISC 2010.

[15]  Anders Logg,et al.  Automated Code Generation for Discontinuous Galerkin Methods , 2008, SIAM J. Sci. Comput..

[16]  Anders Logg,et al.  Automated Solution of Differential Equations by the Finite Element Method: The FEniCS Book , 2012 .

[17]  J. Cohen,et al.  Novel Architectures: Solving Computational Problems with GPU Computing , 2009, Computing in Science & Engineering.

[18]  Timothy C. Warburton,et al.  Nodal discontinuous Galerkin methods on graphics processors , 2009, J. Comput. Phys..

[19]  Christian Becker,et al.  FEAST—realization of hardware‐oriented numerics for HPC simulations with finite elements , 2010, Concurr. Comput. Pract. Exp..