HOMMEXX 1.0: a performance-portable atmospheric dynamical core for the Energy Exascale Earth System Model

Abstract. We present an architecture-portable and performant implementation of the atmospheric dynamical core (High-Order Methods Modeling Environment, HOMME) of the Energy Exascale Earth System Model (E3SM). The original Fortran implementation is highly performant and scalable on conventional architectures using the Message Passing Interface (MPI) and Open MultiProcessor (OpenMP) programming models. We rewrite the model in C++ and use the Kokkos library to express on-node parallelism in a largely architecture-independent implementation. Kokkos provides an abstraction of a compute node or device, layout-polymorphic multidimensional arrays, and parallel execution constructs. The new implementation achieves the same or better performance on conventional multicore computers and is portable to GPUs. We present performance data for the original and new implementations on multiple platforms, on up to 5400 compute nodes, and study several aspects of the single- and multi-node performance characteristics of the new implementation on conventional CPU (e.g., Intel Xeon), many core CPU (e.g., Intel Xeon Phi Knights Landing), and Nvidia V100 GPU.

[1]  Sheri Mickelson,et al.  Preparing the Community Earth System Model for Exascale Computing , 2017 .

[2]  Thomas L. Sterling,et al.  ParalleX An Advanced Parallel Execution Model for Scaling-Impaired Applications , 2009, 2009 International Conference on Parallel Processing Workshops.

[3]  Thomas M. Smith,et al.  The Aeras Next Generation Global Atmosphere Model. , 2016 .

[4]  Andrew M. Bradley,et al.  Towards Performance Portability in a Compressible CFD Code , 2017 .

[5]  Mark A. Taylor,et al.  Performance and Performance Engineering of the Community Earth System Model , 2011 .

[6]  Qiushi Chen,et al.  ALBANY: USING COMPONENT-BASED DESIGN TO DEVELOP A FLEXIBLE, GENERIC MULTIPHYSICS ANALYSIS CODE , 2016 .

[7]  Barbara Chapman,et al.  Using OpenMP - portable shared memory parallel programming , 2007, Scientific and engineering computation.

[8]  Thomas M. Smith,et al.  Aeras: A Next Generation Global Atmosphere Model , 2015, ICCS.

[9]  Richard D. Hornung,et al.  The RAJA Portability Layer: Overview and Status , 2014 .

[10]  Katherine J. Evans,et al.  A case study of CUDA FORTRAN and OpenACC for an atmospheric climate kernel , 2015, J. Comput. Sci..

[11]  Roger P. Pawlowski,et al.  Toward performance portability of the Albany finite element analysis code using the Kokkos library , 2018, Int. J. High Perform. Comput. Appl..

[12]  Claudio Canuto,et al.  Spectral Methods: Evolution to Complex Geometries and Applications to Fluid Dynamics (Scientific Computation) , 2007 .

[13]  Torsten Hoefler,et al.  Near-global climate simulation at 1 km resolution: establishing a performance baseline on 4888 GPUs with COSMO 5.0 , 2017 .

[14]  Emil M. Constantinescu,et al.  Acceleration of the IMplicit–EXplicit nonhydrostatic unified model of the atmosphere on manycore processors , 2017, Int. J. High Perform. Comput. Appl..

[15]  Robert Pincus,et al.  The CLAW DSL: Abstractions for Performance Portable Weather and Climate Models , 2018, PASC.

[16]  Mark A. Taylor,et al.  Performance of the Community Earth System Model , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[17]  W. Collins,et al.  The Community Earth System Model: A Framework for Collaborative Research , 2013 .

[18]  Mark A. Taylor,et al.  High-Resolution Mesh Convergence Properties and Parallel Efficiency of a Spectral Element Atmospheric Dynamical Core , 2005, Int. J. High Perform. Comput. Appl..

[19]  A. Patera,et al.  Spectral element methods for the incompressible Navier-Stokes equations , 1989 .

[20]  Tobias Gysi,et al.  Towards a performance portable, architecture agnostic implementation strategy for weather and climate models , 2014, Supercomput. Front. Innov..

[21]  Wenguang Chen,et al.  Refactoring and Optimizing the Community Atmosphere Model (CAM) on the Sunway TaihuLight Supercomputer , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[22]  Mark A. Taylor,et al.  CAM-SE: A scalable spectral element dynamical core for the Community Atmosphere Model , 2012, Int. J. High Perform. Comput. Appl..

[23]  Jack J. Dongarra,et al.  Vectorizing compilers: a test suite and results , 1988, Proceedings. SUPERCOMPUTING '88.

[24]  Mahesh Rajan,et al.  An investigation of compiler vectorization on current and next-generation Intel processors using benchmarks and Sandia?s Sierra applications. , 2015 .

[25]  Laxmikant V. Kale,et al.  Programming Petascale Applications with Charm , 2007 .

[26]  John M. Dennis,et al.  EARLY EXPERIENCES WITH THE 360TF IBM BLUE GENE/L PLATFORM , 2008 .

[27]  Weiguo Liu,et al.  Redesigning CAM-SE for Peta-Scale Climate Modeling Performance and Ultra-High Resolution on Sunway TaihuLight , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[28]  Paul A. Ullrich,et al.  The spectral element method (SEM) on variable-resolution grids: evaluating grid sensitivity and resolution-aware numerical viscosity , 2014 .

[29]  P. Woodward,et al.  The Piecewise Parabolic Method (PPM) for Gas Dynamical Simulations , 1984 .

[30]  Brigitte Rozoy,et al.  Boost.SIMD: generic programming for portable SIMDization , 2012, PACT '12.

[31]  Daniel Sunderland,et al.  Kokkos: Enabling manycore performance portability through polymorphic memory access patterns , 2014, J. Parallel Distributed Comput..

[32]  Mark A. Taylor,et al.  Progress towards accelerating HOMME on hybrid multi-core systems , 2013, Int. J. High Perform. Comput. Appl..

[33]  Mark A. Taylor,et al.  Optimization-based limiters for the spectral element method , 2014, J. Comput. Phys..

[34]  David Mohr,et al.  Stella: a python-based domain-specific language for simulations , 2016, SAC.

[35]  Timothy C. Warburton,et al.  OCCA: A unified approach to multi-threading languages , 2014, ArXiv.

[36]  Mark A. Taylor,et al.  Conservation of Mass and Energy for the Moist Atmospheric Primitive Equations on Unstructured Grids , 2011 .