Finite element numerical integration for first order approximations on multi-core architectures

The paper presents investigations on the implementation and performance of the finite element numerical integration algorithm for first order approximations and three processor architectures, popular in scientific computing, classical CPU, Intel Xeon Phi and NVIDIA Kepler GPU. A unifying programming model and portable OpenCL implementation is considered for all architectures. Variations of the algorithm due to different problems solved and different element types are investigated and several optimizations aimed at proper optimization and mapping of the algorithm to computer architectures are demonstrated. Performance models of execution are developed for different processors and tested in practical experiments. The results show the varying levels of performance for different architectures, but indicate that the algorithm can be effectively ported to all of them. The general conclusion is that the finite element numerical integration can achieve sufficient performance on different multiand many-core architectures and should not become a performance bottleneck for finite element simulation codes.

[1]  Hiroshi Okuda,et al.  GPU Acceleration for FEM-Based Structural Analysis , 2013 .

[2]  Shahin Sirouspour,et al.  GPU-based acceleration of computations in nonlinear finite element deformation analysis. , 2014, International journal for numerical methods in biomedical engineering.

[3]  Krzysztof Banas,et al.  Scalability Analysis for a Multigrid Linear Equations Solver , 2007, PPAM.

[4]  Krzysztof Banas,et al.  Finite Element Numerical Integration on PowerXCell Processors , 2009, PPAM.

[5]  Eduardo Rocha Rodrigues,et al.  A novel finite element method assembler for co-processors and accelerators , 2013, IA3 '13.

[6]  Matthew G. Knepley,et al.  Finite Element Integration on GPUs , 2013, TOMS.

[7]  Andreas Moshovos,et al.  Demystifying GPU microarchitecture through microbenchmarking , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[8]  Guangyao Li,et al.  A Parallel Node-based Solution Scheme for Implicit Finite Element Method Using GPU☆ , 2013 .

[9]  Matthew G. Knepley,et al.  Optimizing the Evaluation of Finite Element Matrices , 2005, SIAM J. Sci. Comput..

[10]  Krzysztof Banas A Modular Design for Parallel Adaptive Finite Element Computational Kernels , 2004, International Conference on Computational Science.

[11]  Michael B. Giles,et al.  Finite Element Algorithms and Data Structures on Graphical Processing Units , 2015, International Journal of Parallel Programming.

[12]  Michal Mrozowski,et al.  FINITE ELEMENT MATRIX GENERATION ON A GPU , 2012 .

[13]  David A. Ham,et al.  Finite element assembly strategies on multi‐core and many‐core architectures , 2013 .

[14]  Sergio Idelsohn,et al.  OpenCL‐based implementation of an unstructured edge‐based finite element convection‐diffusion solver on graphics hardware , 2012 .

[15]  Timothy C. Warburton,et al.  Nodal discontinuous Galerkin methods on graphics processors , 2009, J. Comput. Phys..

[16]  Claes Johnson Numerical solution of partial differential equations by the finite element method , 1988 .

[17]  Li Tang,et al.  GPU acceleration of Data Assembly in Finite Element Methods and its energy implications , 2013, 2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors.

[18]  Gordon Erlebacher,et al.  Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA , 2009, J. Parallel Distributed Comput..

[19]  Stefan Turek,et al.  Towards a complete FEM-based simulation toolkit on GPUs: Unstructured grid finite element geometric multigrid solvers with strong smoothers based on sparse approximate inverses , 2013 .

[20]  S. Arabia,et al.  Computational cost estimates for parallel shared memory isogeometric multi-frontal solvers , 2014 .

[21]  Ross T. Whitaker,et al.  Architecting the finite element method pipeline for the GPU , 2014, J. Comput. Appl. Math..

[22]  Yao Zhang,et al.  A quantitative performance analysis model for GPU architectures , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[23]  Dieter an Mey,et al.  Accelerators for Technical Computing: Is It Worth the Pain? A TCO Perspective , 2013, ISC.

[24]  David A. Ham,et al.  Towards generating optimised finite element solvers for GPUs from high-level specifications , 2010, ICCS.

[25]  Samuel Williams,et al.  Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[26]  Krzysztof Banas,et al.  Vectorized OpenCL implementation of numerical integration for higher order finite elements , 2013, Comput. Math. Appl..

[27]  Eric Darve,et al.  Application of Assembly of Finite Element Methods on Graphics Processors for Real-Time Elastodynamics , 2011 .

[28]  Anders Logg,et al.  DOLFIN: Automated finite element computing , 2010, TOMS.

[29]  Ralph Müller,et al.  A scalable multi‐level preconditioner for matrix‐free µ‐finite element analysis of human bone structures , 2008 .