Generation of large finite-element matrices on multiple graphics processors

SUMMARY This paper presents techniques for generating very large finite-element matrices on a multicore workstation equipped with several graphics processing units (GPUs). To overcome the low memory size limitation of the GPUs, and at the same time to accelerate the generation process, we propose to generate the large sparse linear systems arising in finite-element analysis in an iterative manner on several GPUs and to use the graphics accelerators concurrently with CPUs performing collection and addition of the matrix fragments using a fast multithreaded procedure. The scheduling of the threads is organized in such a way that the CPU operations do not affect the performance of the process, and the GPUs are idle only when data are being transferred from GPU to CPU. This approach is verified on two workstations: the first consists of two 6-core Intel Xeon X5690 processors with two Fermi GPUs: each GPU is a GeForce GTX 590 with two graphics processors and 1.5 GB of fast RAM; the second workstation is equipped with two Tesla C2075 boards carrying 6 GB of RAM each and two 12-core Opteron 6174s. For the latter setup, we demonstrate the fast generation of sparse finite-element matrices as large as 10 million unknowns, with over 1 billion nonzero entries. Comparing with the single-threaded and multithreaded CPU implementations, the GPU-based version of the algorithm based on the ideas presented in this paper reduces the finite-element matrix-generation time in double precision by factors of 100 and 30, respectively. Copyright © 2012 John Wiley & Sons, Ltd.

[1]  A. Lamecki,et al.  Tuning a Hybrid GPU-CPU V-Cycle Multilevel Preconditioner for Solving Large Real and Complex Systems of FEM Equations , 2011, IEEE Antennas and Wireless Propagation Letters.

[2]  Pär Ingelström,et al.  A new set of H(curl)-conforming hierarchical basis functions for tetrahedral meshes , 2006 .

[3]  Michela Taufer,et al.  Molecular dynamics simulations of aqueous ions at the liquid–vapor interface accelerated using graphics processors , 2011, J. Comput. Chem..

[4]  Eric Darve,et al.  Assembly of finite element methods on graphics processors , 2011 .

[5]  Michal Mrozowski,et al.  FINITE ELEMENT MATRIX GENERATION ON A GPU , 2012 .

[6]  David A. Ham,et al.  Finite element assembly strategies on multi‐core and many‐core architectures , 2013 .

[7]  Sergio Idelsohn,et al.  OpenCL‐based implementation of an unstructured edge‐based finite element convection‐diffusion solver on graphics hardware , 2012 .

[8]  Eric Darve,et al.  Application of Assembly of Finite Element Methods on Graphics Processors for Real-Time Elastodynamics , 2011 .

[9]  Atef Z. Elsherbeni,et al.  GPU acceleration of linear systems for computational electromagnetic simulations , 2009, 2009 IEEE Antennas and Propagation Society International Symposium.

[10]  T. Takahashi,et al.  GPU‐accelerated boundary element method for Helmholtz' equation in three dimensions , 2009 .

[11]  Frank T.-C. Tsai,et al.  GPU accelerated lattice Boltzmann model for shallow water flow and mass transport , 2011 .

[12]  Rainald Löhner,et al.  Running unstructured grid‐based CFD solvers on modern graphics hardware , 2011 .

[13]  Jian-Ming Jin,et al.  A highly effective preconditioner for solving the finite element-boundary integral matrix equation of 3-D scattering , 2002 .

[14]  Michal Mrozowski,et al.  GPU Acceleration of Multilevel Solvers for Analysis of Microwave Components With Finite Element Method , 2011, IEEE Microwave and Wireless Components Letters.

[15]  Hai Lin,et al.  FROM CPU TO GPU: GPU-BASED ELECTROMAGNETIC COMPUTING (GPUECO) , 2008 .

[16]  Krzysztof Banas,et al.  3D finite element numerical integration on GPUs , 2010, ICCS.

[17]  Michela Taufer,et al.  Structural, dynamic, and electrostatic properties of fully hydrated DMPC bilayers from molecular dynamics simulations accelerated with graphical processing units (GPUs) , 2011, J. Comput. Chem..

[18]  Eric Darve,et al.  Optimizing the multipole‐to‐local operator in the fast multipole method for graphical processing units , 2012 .

[19]  Maryam Mehri Dehnavi,et al.  Finite-Element Sparse Matrix Vector Multiplication on Graphic Processing Units , 2010, IEEE Transactions on Magnetics.

[20]  Timothy C. Warburton,et al.  Nodal discontinuous Galerkin methods on graphics processors , 2009, J. Comput. Phys..

[21]  Michal Mrozowski,et al.  A Memory Efficient and Fast Sparse Matrix Vector Product on a GPU , 2011 .

[22]  Krzysztof Banas,et al.  Higher order FEM numerical integration on GPUs with OpenCL , 2010, Proceedings of the International Multiconference on Computer Science and Information Technology.

[23]  M. Mrozowski,et al.  How to Render FDTD Computations More Effective Using a Graphics Accelerator , 2009, IEEE Transactions on Magnetics.

[24]  Robert Strzodka,et al.  Co-processor acceleration of an unmodified parallel solid mechanics code with FEASTGPU , 2009, Int. J. Comput. Sci. Eng..

[25]  Eric Darve,et al.  Large calculation of the flow over a hypersonic vehicle using a GPU , 2008, J. Comput. Phys..

[26]  Xavier Antoine,et al.  Analytic preconditioners for the electric field integral equation , 2004 .