Toward performance portability of the Albany finite element analysis code using the Kokkos library

Performance portability on heterogeneous high-performance computing (HPC) systems is a major challenge faced today by code developers: parallel code needs to be executed correctly as well as with high performance on machines with different architectures, operating systems, and software libraries. The finite element method (FEM) is a popular and flexible method for discretizing partial differential equations arising in a wide variety of scientific, engineering, and industrial applications that require HPC. This article presents some preliminary results pertaining to our development of a performance portable implementation of the FEM-based Albany code. Performance portability is achieved using the Kokkos library. We present performance results for the Aeras global atmosphere dynamical core module in Albany. Numerical experiments show that our single code implementation gives reasonable performance across three multicore/many-core architectures: NVIDIA General Processing Units (GPU’s), Intel Xeon Phis, and multicore CPUs.

[1]  P. Swarztrauber,et al.  A standard test set for numerical approximations to the shallow water equations in spherical geometry , 1992 .

[2]  G. R. Mudalige,et al.  OP2: An active library framework for solving unstructured mesh-based applications on multi-core and many-core architectures , 2012, 2012 Innovative Parallel Computing (InPar).

[3]  Qiushi Chen,et al.  ALBANY: USING COMPONENT-BASED DESIGN TO DEVELOP A FLEXIBLE, GENERIC MULTIPHYSICS ANALYSIS CODE , 2016 .

[4]  Akila Gothandaraman,et al.  Comparing Hardware Accelerators in Scientific Applications: A Case Study , 2011, IEEE Transactions on Parallel and Distributed Systems.

[5]  Daniel Sunderland,et al.  Kokkos: Enabling manycore performance portability through polymorphic memory access patterns , 2014, J. Parallel Distributed Comput..

[6]  Thomas L. Sterling,et al.  ParalleX An Advanced Parallel Execution Model for Scaling-Impaired Applications , 2009, 2009 International Conference on Parallel Processing Workshops.

[7]  D. Williamson,et al.  A baroclinic instability test case for atmospheric model dynamical cores , 2006 .

[8]  John Shalf,et al.  Tiling as a Durable Abstraction for Parallelism and Data Locality , 2013 .

[9]  Paul A. Ullrich,et al.  The spectral element method (SEM) on variable-resolution grids: evaluating grid sensitivity and resolution-aware numerical viscosity , 2014 .

[10]  Vincent Heuveline HiFlow3: a flexible and hardware-aware parallel finite element package , 2010, POOSC '10.

[11]  Stephen J. Thomas,et al.  A mass and energy conserving spectral element atmospheric dynamical core on the cubed-sphere grid , 2007 .

[12]  Anders Logg,et al.  Unified form language: A domain-specific language for weak formulations of partial differential equations , 2012, TOMS.

[13]  Mark A. Taylor,et al.  Conservation of Mass and Energy for the Moist Atmospheric Primitive Equations on Unstructured Grids , 2011 .

[14]  Aimé Fournier,et al.  Climate modeling with spectral elements , 2006 .

[15]  Timothy C. Warburton,et al.  OCCA: A unified approach to multi-threading languages , 2014, ArXiv.

[16]  Aaftab Munshi,et al.  The OpenCL specification , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).

[17]  Andrew T. T. McRae,et al.  Firedrake: automating the finite element method by composing abstractions , 2015, ACM Trans. Math. Softw..

[18]  Roger P. Pawlowski,et al.  Automating embedded analysis capabilities and managing software complexity in multiphysics simulation, Part I: Template-based generic programming , 2012 .

[19]  Anders Logg,et al.  The FEniCS Project Version 1.5 , 2015 .

[20]  Richard D. Hornung,et al.  The RAJA Portability Layer: Overview and Status , 2014 .

[21]  Bil Lewis,et al.  Multithreaded Programming With PThreads , 1997 .

[22]  Roger P. Pawlowski,et al.  Rythmos: Solution and Analysis Package for Differential-Algebraic and Ordinary-Differential Equations. , 2017 .

[23]  Roger P. Pawlowski,et al.  Efficient Expression Templates for Operator Overloading-based Automatic Differentiation , 2012, ArXiv.

[24]  M. Taylor,et al.  Accuracy Analysis of a Spectral Element Atmospheric Model Using a Fully Implicit Solution Framework , 2010 .

[25]  Alan B. Williams,et al.  SIERRA Toolkit Computational Mesh Conceptual Model , 2010 .

[26]  Andrew G. Salinger,et al.  Albany/FELIX : a parallel, scalable and robust, finite element, first-order Stokes approximation ice sheet solver built for advanced analysis , 2014 .

[27]  Lawrence Mitchell,et al.  PyOP2: A High-Level Framework for Performance-Portable Simulations on Unstructured Meshes , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[28]  Mark A. Taylor,et al.  CAM-SE: A scalable spectral element dynamical core for the Community Atmosphere Model , 2012, Int. J. High Perform. Comput. Appl..

[29]  Thomas M. Smith,et al.  Aeras: A Next Generation Global Atmosphere Model , 2015, ICCS.

[30]  Erik Nielsen,et al.  Quantum computer aided design simulation and optimization of semiconductor quantum dots , 2013, 1403.7561.

[31]  WaiChing Sun,et al.  A stabilized assumed deformation gradient finite element formulation for strongly coupled poromechanical simulations at finite strain , 2013 .

[32]  Pavel B. Bochev,et al.  Solving PDEs with Intrepid , 2012, Sci. Program..

[33]  Laxmikant V. Kale,et al.  Programming Petascale Applications with Charm , 2007 .