Fine-grained GPU implementation of assembly-free iterative solver for finite element problems

Fine-grained GPU instance of matrix-free CG solver for FEM is proposed.The parallelization potential of SIMD architectures is exploited.The workload is well-balanced for all the threads of the GPU architecture.The factors affecting to the proposal are studied using diverse GPU instances. This paper proposes a fine-grained implementation of matrix-free Conjugate Gradient (CG) solver for Finite Element Analysis (FEA) using Graphics Processing Unit (GPU) architectures. The use of GPU computing in FEA is today an active research field. This is primary due to current GPU sparse solvers are partially parallelizable and can hardly make use of Data-Level Parallelism (DLP) for which GPU architectures are designed. The proposed GPU instance takes advantage of Massively Parallel Processing (MPP) architectures performing well-balanced parallel calculations at the Degree-of-Freedom (DoF) level of finite elements. The numerical experiments evaluate and analyze the performance of diverse GPU instances of the matrix-free CG solver.

[1]  Robert Strzodka,et al.  Exploring weak scalability for FEM calculations on a GPU-enhanced cluster , 2007, Parallel Comput..

[2]  Ralph Müller,et al.  A scalable multi‐level preconditioner for matrix‐free µ‐finite element analysis of human bone structures , 2008 .

[3]  Lei Xing,et al.  GPU computing in medical physics: a review. , 2011, Medical physics.

[4]  Guangyao Li,et al.  A Parallel Node-based Solution Scheme for Implicit Finite Element Method Using GPU☆ , 2013 .

[5]  Rainald Löhner,et al.  Running unstructured grid‐based CFD solvers on modern graphics hardware , 2011 .

[6]  Jonas Koko,et al.  Parallel preconditioned conjugate gradient algorithm on GPU , 2012, J. Comput. Appl. Math..

[7]  Thomas Stricker,et al.  Combining task- and data parallelism to speed up protein folding on a desktop grid platform , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[8]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[9]  Michal Mrozowski,et al.  FINITE ELEMENT MATRIX GENERATION ON A GPU , 2012 .

[10]  Lorena A. Barba,et al.  Hierarchical N-body Simulations with Autotuning for Heterogeneous Systems , 2012, Computing in Science & Engineering.

[11]  Dimitri Komatitsch,et al.  Accelerating a three-dimensional finite-difference wave propagation code using GPU graphics cards , 2010 .

[12]  K. Bathe Finite Element Procedures , 1995 .

[13]  Pedro Trancoso,et al.  Trends in High-Performance Computing , 2011, Computing in Science & Engineering.

[14]  Hiroshi Okuda,et al.  GPU Acceleration for FEM-Based Structural Analysis , 2013 .

[15]  B. van Rietbergen,et al.  COMPUTATIONAL STRATEGIES FOR ITERATIVE SOLUTIONS OF LARGE FEM APPLICATIONS EMPLOYING VOXEL DATA , 1996 .

[16]  Zsolt Badics,et al.  High locality and increased intra-node parallelism for solving finite element models on GPUs by novel element-by-element implementation , 2012, 2012 IEEE Conference on High Performance Extreme Computing.

[17]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[18]  Héctor Migallón Gomis,et al.  GPU-based parallel algorithms for sparse nonlinear systems , 2012, J. Parallel Distributed Comput..

[19]  Glaucio H. Paulino,et al.  Toward GPU accelerated topology optimization on unstructured meshes , 2013, Structural and Multidisciplinary Optimization.

[20]  Peter Huthwaite,et al.  Accelerated finite element elastodynamic simulations using the GPU , 2014, J. Comput. Phys..

[21]  Qiang Yang,et al.  A distributed memory parallel element-by-element scheme based on Jacobi-conditioned conjugate gradient for 3D finite element analysis , 2007 .

[22]  Ross T. Whitaker,et al.  Architecting the finite element method pipeline for the GPU , 2014, J. Comput. Appl. Math..

[23]  Yousef Saad,et al.  GPU-accelerated preconditioned iterative linear solvers , 2013, The Journal of Supercomputing.

[24]  G. Carey,et al.  Element‐by‐element linear and nonlinear solution schemes , 1986 .

[25]  David M Fernández,et al.  Enhancing the Performance of Conjugate Gradient Solvers on Graphic Processing Units , 2011, IEEE Transactions on Magnetics.

[26]  Gerhard Wellein,et al.  Performance engineering for the Lattice Boltzmann method on GPGPUs: Architectural requirements and performance results , 2011, ArXiv.

[27]  Arne S. Gullerud,et al.  MPI-based implementation of a PCG solver using an EBE architecture and preconditioner for implicit, 3-D finite element analysis , 2001 .

[28]  Raphael T. Haftka,et al.  Structural optimization complexity: what has Moore’s law done for us? , 2004 .

[29]  Wolfgang Straßer,et al.  A Parallel Preconditioned Conjugate Gradient Solver for the Poisson Problem on a Multi-GPU Platform , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.

[30]  R. Schaller,et al.  Moore's law: past, present and future , 1997 .

[31]  S. Gyimothy,et al.  Parallel Realization of the Element-by-Element FEM Technique by CUDA , 2012, IEEE Transactions on Magnetics.

[32]  James Demmel,et al.  A view of the parallel computing landscape , 2009, CACM.

[33]  K. Suresh Efficient generation of large-scale pareto-optimal topologies , 2013 .

[34]  Martin Lilleeng Sætra,et al.  Graphics processing unit (GPU) programming strategies and trends in GPU computing , 2013, J. Parallel Distributed Comput..

[35]  Dennis W. Prather,et al.  How to choose electromagnetic software , 1997 .

[36]  Xu Guo,et al.  Matrix-free GPU implementation of a preconditioned conjugate gradient solver for anisotropic elliptic PDEs , 2013, Comput. Vis. Sci..

[38]  Cornelis Vuik,et al.  GPU implementation of a Helmholtz Krylov solver preconditioned by a shifted Laplace multigrid method , 2011, J. Comput. Appl. Math..