A new sparse matrix vector multiplication graphics processing unit algorithm designed for finite element problems

Summary Recently, graphics processing units (GPUs) have been increasingly leveraged in a variety of scientific computing applications. However, architectural differences between CPUs and GPUs necessitate the development of algorithms that take advantage of GPU hardware. As sparse matrix vector (SPMV) multiplication operations are commonly used in finite element analysis, a new SPMV algorithm and several variations are developed for unstructured finite element meshes on GPUs. The effective bandwidth of current GPU algorithms and the newly proposed algorithms are measured and analyzed for 15 sparse matrices of varying sizes and varying sparsity structures. The effects of optimization and differences between the new GPU algorithm and its variants are then subsequently studied. Lastly, both new and current SPMV GPU algorithms are utilized in the GPU CG solver in GPU finite element simulations of the heart. These results are then compared against parallel PETSc finite element implementation results. The effective bandwidth tests indicate that the new algorithms compare very favorably with current algorithms for a wide variety of sparse matrices and can yield very notable benefits. GPU finite element simulation results demonstrate the benefit of using GPUs for finite element analysis and also show that the proposed algorithms can yield speedup factors up to 12-fold for real finite element applications. Copyright © 2015 John Wiley & Sons, Ltd.

[1]  Peter Huthwaite,et al.  Accelerated finite element elastodynamic simulations using the GPU , 2014, J. Comput. Phys..

[2]  Matthew G. Knepley,et al.  PETSc Users Manual (Rev. 3.3) , 2013 .

[3]  Michal Mrozowski,et al.  A Memory Efficient and Fast Sparse Matrix Vector Product on a GPU , 2011 .

[4]  Arutyun Avetisyan,et al.  Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures , 2010, HiPEAC.

[5]  Sébastien Ourselin,et al.  High-Speed Nonlinear Finite Element Analysis for Surgical Simulation Using Graphics Processing Units , 2008, IEEE Transactions on Medical Imaging.

[6]  Stefan Turek,et al.  GPU acceleration of an unmodified parallel finite element Navier-Stokes solver , 2009, 2009 International Conference on High Performance Computing & Simulation.

[7]  Eric Darve,et al.  Assembly of finite element methods on graphics processors , 2011 .

[8]  S. Göktepe,et al.  Computational modeling of electrochemical coupling: A novel finite element approach towards ionic models for cardiac electrophysiology , 2011 .

[9]  Eric Darve,et al.  Large calculation of the flow over a hypersonic vehicle using a GPU , 2008, J. Comput. Phys..

[10]  Michal Mrozowski,et al.  FINITE ELEMENT MATRIX GENERATION ON A GPU , 2012 .

[11]  Antonio Susín,et al.  Non structured meshes for Cloth GPU simulation using FEM , 2006, VRIPHYS.

[12]  David A. Ham,et al.  Finite element assembly strategies on multi‐core and many‐core architectures , 2013 .

[13]  Gerhard Wellein,et al.  A unified sparse matrix data format for modern processors with wide SIMD units , 2013, ArXiv.

[14]  Michael Garland,et al.  Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[15]  Gordon Erlebacher,et al.  High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster , 2010, J. Comput. Phys..

[16]  Arutyun Avetisyan,et al.  Implementing Blocked Sparse Matrix-Vector Multiplication on NVIDIA GPUs , 2009, SAMOS.

[17]  Krzysztof Banas,et al.  Higher order FEM numerical integration on GPUs with OpenCL , 2010, Proceedings of the International Multiconference on Computer Science and Information Technology.

[18]  Timothy C. Warburton,et al.  Nodal discontinuous Galerkin methods on graphics processors , 2009, J. Comput. Phys..

[19]  Lei Xing,et al.  GPU computing in medical physics: a review. , 2011, Medical physics.

[20]  J. C. Simo,et al.  A framework for finite strain elastoplasticity based on maximum plastic dissipation and the multiplicative decomposition. part II: computational aspects , 1988 .

[21]  David Atkinson,et al.  On modelling of anisotropic viscoelasticity for soft tissue simulation: Numerical solution and GPU execution , 2009, Medical Image Anal..

[22]  Michael Garland,et al.  Efficient Sparse Matrix-Vector Multiplication on CUDA , 2008 .

[23]  Robert Strzodka,et al.  Using GPUs to improve multigrid solver performance on a cluster , 2008, Int. J. Comput. Sci. Eng..

[24]  Dominik Göddeke,et al.  Fast and accurate finite-element multigrid solvers for PDE simulations on GPU clusters , 2011 .

[25]  Gernot Plank,et al.  Near-real-time simulations of biolelectric activity in small mammalian hearts using graphical processing units , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[26]  Samuel Williams,et al.  Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[27]  Stefan Turek,et al.  FEAST—realization of hardware-oriented numerics for HPC simulations with finite elements , 2010, ISC 2010.

[28]  Robert Strzodka,et al.  Accelerating Double Precision FEM Simulations with GPUs , 2011 .

[29]  S. Göktepe,et al.  Computational modeling of cardiac electrophysiology: A novel finite element approach , 2009 .

[30]  Karol Miller,et al.  Real-Time Nonlinear Finite Element Computations on GPU - Application to Neurosurgical Simulation. , 2010, Computer methods in applied mechanics and engineering.

[31]  Thomas Ertl,et al.  Large steps in GPU-based deformable bodies simulation , 2005, Simul. Model. Pract. Theory.

[32]  Ester M. Garzón,et al.  Improving the Performance of the Sparse Matrix Vector Product with GPUs , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.

[33]  Eric Darve,et al.  Application of Assembly of Finite Element Methods on Graphics Processors for Real-Time Elastodynamics , 2011 .

[34]  Guillaume Caumon,et al.  Concurrent number cruncher: a GPU implementation of a general sparse linear solver , 2009, Int. J. Parallel Emergent Distributed Syst..

[35]  Gerhard Wellein,et al.  Sparse Matrix-vector Multiplication on GPGPU Clusters: A New Storage Format and a Scalable Implementation , 2011, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[36]  Ester M. Garzón,et al.  The sparse matrix vector product on GPUs , 2011 .

[37]  Manfred Liebmann,et al.  A Parallel Algebraic Multigrid Solver on Graphics Processing Units , 2009, HPCA.

[38]  David A. Ham,et al.  Towards generating optimised finite element solvers for GPUs from high-level specifications , 2010, ICCS.

[39]  Ezio Bartocci,et al.  Toward real-time simulation of cardiac dynamics , 2011, CMSB.

[40]  Scott B. Baden,et al.  Source-to-Source Optimization of CUDA C for GPU Accelerated Cardiac Cell Modeling , 2010, Euro-Par.

[41]  Gordon Erlebacher,et al.  Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA , 2009, J. Parallel Distributed Comput..

[42]  R. Aliev,et al.  A simple two-variable model of cardiac excitation , 1996 .

[43]  Eitan Grinspun,et al.  Sparse matrix solvers on the GPU: conjugate gradients and multigrid , 2003, SIGGRAPH Courses.

[44]  Hyun Jin Moon,et al.  Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure , 2005, HPCC.

[45]  Christian Becker,et al.  FEAST—realization of hardware‐oriented numerics for HPC simulations with finite elements , 2010, Concurr. Comput. Pract. Exp..

[46]  A. Lamecki,et al.  Accuracy, Memory, and Speed Strategies in GPU-Based Finite-Element Matrix-Generation , 2012, IEEE Antennas and Wireless Propagation Letters.