Global finite element matrix construction based on a CPU-GPU implementation

The finite element method (FEM) has several computational steps to numerically solve a particular problem, to which many efforts have been directed to accelerate the solution stage of the linear system of equations. However, the finite element matrix construction, which is also time-consuming for unstructured meshes, has been less investigated. The generation of the global finite element matrix is performed in two steps, computing the local matrices by numerical integration and assembling them into a global system, which has traditionally been done in serial computing. This work presents a fast technique to construct the global finite element matrix that arises by solving the Poisson's equation in a three-dimensional domain. The proposed methodology consists in computing the numerical integration, due to its intrinsic parallel opportunities, in the graphics processing unit (GPU) and computing the matrix assembly, due to its intrinsic serial operations, in the central processing unit (CPU). In the numerical integration, only the lower triangular part of each local stiffness matrix is computed thanks to its symmetry, which saves GPU memory and computing time. As a result of symmetry, the global sparse matrix also contains non-zero elements only in its lower triangular part, which reduces the assembly operations and memory usage. This methodology allows generating the global sparse matrix from any unstructured finite element mesh size on GPUs with little memory capacity, only limited by the CPU memory.

[1]  Anders Logg,et al.  Automated Solution of Differential Equations by the Finite Element Method: The FEniCS Book , 2012 .

[2]  Gerhard J. Woeginger,et al.  Graph colorings , 2005, Theor. Comput. Sci..

[3]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[4]  Timothy A. Davis,et al.  Direct methods for sparse linear systems , 2006, Fundamentals of algorithms.

[5]  Wen Lea Pearn,et al.  (Journal of Computational and Applied Mathematics,228(1):274-278)Optimization of the T Policy M/G/1 Queue with Server Breakdowns and General Startup Times , 2009 .

[6]  Timothy A. Davis,et al.  Direct Methods for Sparse Linear Systems (Fundamentals of Algorithms 2) , 2006 .

[7]  Jie Cheng,et al.  Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..

[8]  O. C. Zienkiewicz,et al.  The Finite Element Method: Its Basis and Fundamentals , 2005 .

[9]  Elsevier Sdol,et al.  Journal of Parallel and Distributed Computing , 2009 .

[11]  A. N. Other A demonstration of the L A T E X2ε class file for the International Journal for Numerical Methods in Fluids , 2010 .

[12]  Jin Au Kong,et al.  Progress in Electromagnetics Research , 1989 .

[13]  戸高 法文,et al.  Geochemistry , 2019, Nature.

[14]  D. Braess Finite Elements: Theory, Fast Solvers, and Applications in Solid Mechanics , 1995 .

[15]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[16]  Daniel Gooch,et al.  Communications of the ACM , 2011, XRDS.

[17]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[18]  Lee Sael,et al.  Procedia Computer Science , 2015 .

[19]  T-h Kim,et al.  Journal of Supercomputing , 2013 .

[20]  P. Baccarelli IEEE Antennas and Wireless Propagation Letters , 2018, IEEE Antennas and Wireless Propagation Letters.

[21]  A. T. Harding,et al.  Advanced Engineering Mathematics , 1977 .