Finite element numerical integration for first order approximations on multi- and many-core architectures

Abstract The paper presents investigations on the performance of the finite element numerical integration algorithm for first order approximations and three processor architectures, popular in scientific computing, classical x86_64 CPU, Intel Xeon Phi and NVIDIA Kepler GPU. We base the discussion on theoretical performance models and our own implementations for which we perform a range of computational experiments. For the latter, we consider a unifying programming model and portable OpenCL implementation for all architectures. Variations of the algorithm due to different problems solved and different element types are investigated and several optimizations aimed at proper optimization and mapping of the algorithm to computer architectures are demonstrated. The experimental results show the varying levels of performance for different architectures, but indicate that the algorithm can be effectively ported to all of them. The conclusions indicate the factors that limit the performance for different problems and types of approximation and the performance ranges that can be expected for FEM numerical integration on different processor architectures.

[1]  J. Filipovic,et al.  Automatically Optimized GPU Acceleration of Element Subroutines in Finite Element Method , 2012, 2012 Symposium on Application Accelerators in High Performance Computing.

[2]  Ümit V. Çatalyürek,et al.  Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi , 2013, PPAM.

[3]  Shahin Sirouspour,et al.  GPU-based acceleration of computations in nonlinear finite element deformation analysis. , 2014, International journal for numerical methods in biomedical engineering.

[4]  M. Papadrakakis,et al.  GPU accelerated computation of the isogeometric analysis stiffness matrix , 2014 .

[5]  Ralph Müller,et al.  A scalable multi‐level preconditioner for matrix‐free µ‐finite element analysis of human bone structures , 2008 .

[6]  Krzysztof Banas,et al.  Adaptive Finite Element Modelling of Welding Processes , 2014, PL-Grid.

[7]  T. Hughes,et al.  Streamline upwind/Petrov-Galerkin formulations for convection dominated flows with particular emphasis on the incompressible Navier-Stokes equations , 1990 .

[8]  Ted Belytschko,et al.  Finite Elements, An Introduction , 1982 .

[9]  Andreas Moshovos,et al.  Demystifying GPU microarchitecture through microbenchmarking , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[10]  Krzysztof Banaś,et al.  AMD APU systems as a platform for scientific computing , 2015 .

[11]  Krzysztof Banas A Modular Design for Parallel Adaptive Finite Element Computational Kernels , 2004, International Conference on Computational Science.

[12]  Krzysztof Banas,et al.  Numerical integration on GPUs for higher order finite elements , 2013, Comput. Math. Appl..

[13]  Samuel Williams,et al.  Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[14]  Timothy C. Warburton,et al.  Nodal discontinuous Galerkin methods on graphics processors , 2009, J. Comput. Phys..

[15]  Yao Zhang,et al.  A quantitative performance analysis model for GPU architectures , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[16]  Michael B. Giles,et al.  Finite Element Algorithms and Data Structures on Graphical Processing Units , 2015, International Journal of Parallel Programming.

[17]  Dieter an Mey,et al.  Accelerators for Technical Computing: Is It Worth the Pain? A TCO Perspective , 2013, ISC.

[18]  Eric Darve,et al.  Application of Assembly of Finite Element Methods on Graphics Processors for Real-Time Elastodynamics , 2011 .

[19]  Matthew G. Knepley,et al.  Finite Element Integration on GPUs , 2013, TOMS.

[20]  Stefan Turek,et al.  Towards a complete FEM-based simulation toolkit on GPUs: Unstructured grid finite element geometric multigrid solvers with strong smoothers based on sparse approximate inverses , 2013 .

[21]  Claes Johnson Numerical solution of partial differential equations by the finite element method , 1988 .

[22]  David A. Ham,et al.  Towards generating optimised finite element solvers for GPUs from high-level specifications , 2010, ICCS.

[23]  Li Tang,et al.  GPU acceleration of Data Assembly in Finite Element Methods and its energy implications , 2013, 2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors.

[24]  Filip Kruel,et al.  Vectorized OpenCL implementation of numerical integration for higher order finite elements , 2013 .

[25]  Krzysztof Banas,et al.  Performance Analysis of Iterative Solvers of Linear Equations for Hp-adaptive Finite Element Method , 2013, ICCS.

[26]  Anders Logg,et al.  DOLFIN: Automated finite element computing , 2010, TOMS.

[27]  Gordon Erlebacher,et al.  Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA , 2009, J. Parallel Distributed Comput..

[28]  Hiroshi Okuda,et al.  GPU Acceleration for FEM-Based Structural Analysis , 2013 .

[29]  Peter Huthwaite,et al.  Accelerated finite element elastodynamic simulations using the GPU , 2014, J. Comput. Phys..

[30]  Krzysztof Banas,et al.  Finite Element Numerical Integration on GPUs , 2009, PPAM.

[31]  Eric Darve,et al.  Assembly of finite element methods on graphics processors , 2011 .

[32]  Michal Mrozowski,et al.  FINITE ELEMENT MATRIX GENERATION ON A GPU , 2012 .

[33]  David A. Ham,et al.  Finite element assembly strategies on multi‐core and many‐core architectures , 2013 .

[34]  Sergio Idelsohn,et al.  OpenCL‐based implementation of an unstructured edge‐based finite element convection‐diffusion solver on graphics hardware , 2012 .

[35]  Krzysztof Banas,et al.  Scalability Analysis for a Multigrid Linear Equations Solver , 2007, PPAM.

[36]  Krzysztof Banas,et al.  Finite Element Numerical Integration on PowerXCell Processors , 2009, PPAM.

[37]  Guangyao Li,et al.  A Parallel Node-based Solution Scheme for Implicit Finite Element Method Using GPU☆ , 2013 .

[38]  Matthew G. Knepley,et al.  Optimizing the Evaluation of Finite Element Matrices , 2005, SIAM J. Sci. Comput..

[39]  Krzysztof Banas,et al.  Higher order FEM numerical integration on GPUs with OpenCL , 2010, Proceedings of the International Multiconference on Computer Science and Information Technology.

[40]  Victor M. Calo,et al.  Computational cost estimates for parallel shared memory isogeometric multi-frontal solvers , 2014, Comput. Math. Appl..

[41]  Michal Mrozowski,et al.  Generation of large finite-element matrices on multiple graphics processors , 2013 .

[42]  Ross T. Whitaker,et al.  Architecting the finite element method pipeline for the GPU , 2014, J. Comput. Appl. Math..

[43]  Eduardo Rocha Rodrigues,et al.  A novel finite element method assembler for co-processors and accelerators , 2013, IA3 '13.