Strong scaling for numerical weather prediction at petascale with the atmospheric model NUMA

Numerical weather prediction (NWP) has proven to be computationally challenging due to its inherent multiscale nature. Currently, the highest resolution global NWP models use a horizontal resolution of 9 km. At this resolution, many important processes in the atmosphere are not resolved. Needless to say, this introduces errors. In order to increase the resolution of NWP models, highly scalable atmospheric models are needed. The non-hydrostatic unified model of the atmosphere (NUMA), developed by the authors at the Naval Postgraduate School, was designed to achieve this purpose. NUMA is used by the Naval Research Laboratory, Monterey as the engine inside its next generation weather prediction system NEPTUNE. NUMA solves the fully compressible Navier–Stokes equations by means of high-order Galerkin methods (both spectral element as well as discontinuous Galerkin methods can be used). NUMA is capable of running middle and upper atmosphere simulations since it does not make use of the shallow-atmosphere approximation. This article presents the performance analysis and optimization of the spectral element version of NUMA. The performance at different optimization stages is analyzed using a theoretical performance model as well as measurements via hardware counters. Machine-independent optimization is compared to machine-specific optimization using Blue Gene (BG)/Q vector intrinsics. The best portable version of the main computations was found to be about two times slower than the best non-portable version. By using vector intrinsics, the main computations reach 1.2 PFlops on the entire IBM Blue Gene supercomputer Mira (12% of the theoretical peak performance). The article also presents scalability studies for two idealized test cases that are relevant for NWP applications. The atmospheric model NUMA delivers an excellent strong scaling efficiency of 99% on the entire supercomputer Mira using a mesh with 1.8 billion grid points. This allows running a global forecast of a baroclinic wave test case at a 3-km uniform horizontal resolution and double precision within the time frame required for operational weather prediction.

[1]  Michael E. Papka,et al.  Early Experience on the Blue Gene/Q Supercomputing System , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[2]  Alan Norton,et al.  Petascale WRF simulation of hurricane sandy: Deployment of NCSA's cray XE6 blue waters , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[3]  Andrzej A. Wyszogrodzki,et al.  Parallel Implementation and Scalability of Cloud Resolving EULAG Model , 2011, PPAM.

[4]  Georg Stadler,et al.  Solution of Nonlinear Stokes Equations Discretized By High-Order Finite Elements on Nonconforming and Anisotropic Meshes, with Application to Ice Sheet Dynamics , 2014, SIAM J. Sci. Comput..

[5]  Jens-Michael Wierum,et al.  On the Quality of Partitions Based on Space-Filling Curves , 2002, International Conference on Computational Science.

[6]  Tobin Isaac Scalable, adaptive methods for forward and inverse problems in continental-scale ice sheet modeling , 2015 .

[7]  Francis X. Giraldo,et al.  Stabilized high-order Galerkin methods based on a parameter-free dynamic SGS model for LES , 2015, J. Comput. Phys..

[8]  Francis X. Giraldo,et al.  Efficient construction of unified continuous and discontinuous Galerkin formulations for the 3D Euler equations , 2016, J. Comput. Phys..

[9]  Francis X. Giraldo,et al.  Continuous and discontinuous Galerkin methods for a scalable three-dimensional nonhydrostatic atmospheric model: Limited-area mode , 2012, J. Comput. Phys..

[10]  David Wells,et al.  The deal.II Library, Version 8.4 , 2016, J. Num. Math..

[11]  D. Williamson,et al.  A baroclinic instability test case for atmospheric model dynamical cores , 2006 .

[12]  Peter Bauer,et al.  The quiet revolution of numerical weather prediction , 2015, Nature.

[13]  Barry V. Hess,et al.  Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis , 2010, HiPC 2010.

[14]  Emil M. Constantinescu,et al.  Implicit-Explicit Formulations of a Three-Dimensional Nonhydrostatic Unified Model of the Atmosphere (NUMA) , 2013, SIAM J. Sci. Comput..

[15]  Francis X. Giraldo,et al.  A study of spectral element and discontinuous Galerkin methods for the Navier-Stokes equations in nonhydrostatic mesoscale atmospheric modeling: Equation sets and test cases , 2008, J. Comput. Phys..

[16]  Francis X. Giraldo,et al.  Comparison between adaptive and uniform discontinuous Galerkin simulations in dry 2D bubble experiments , 2013, J. Comput. Phys..

[17]  H.M. Tufo,et al.  Terascale Spectral Element Algorithms and Implementations , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[18]  Carsten Burstedde,et al.  Recursive Algorithms for Distributed Forests of Octrees , 2014, SIAM J. Sci. Comput..

[19]  Constantine Bekas,et al.  An extreme-scale implicit solver for complex PDEs: highly heterogeneous flow in earth's mantle , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[20]  Nikolaus A. Adams,et al.  11 PFLOP/s simulations of cloud cavitation collapse , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[21]  H. Tufo,et al.  Computational aspects of a scalable high-order discontinuous Galerkin atmospheric dynamical core , 2009 .

[22]  Guillaume Houzeaux,et al.  A Review of Element-Based Galerkin Methods for Numerical Weather Prediction: Finite Elements, Spectral Elements, and Discontinuous Galerkin , 2015 .

[23]  Mark A. Taylor,et al.  CAM-SE: A scalable spectral element dynamical core for the Community Atmosphere Model , 2012, Int. J. High Perform. Comput. Appl..

[24]  Guojing Cong,et al.  Application data prefetching on the IBM Blue Gene/Q supercomputer , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[25]  Timothy C. Warburton,et al.  Extreme-Scale AMR , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[26]  Chao Yang,et al.  Enabling and Scaling a Global Shallow-Water Atmospheric Model on Tianhe-2 , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[27]  Carsten Burstedde,et al.  p4est: Scalable Algorithms for Parallel Adaptive Mesh Refinement on Forests of Octrees , 2011, SIAM J. Sci. Comput..

[28]  Andy R. Terrel,et al.  ForestClaw: Hybrid forest-of-octrees AMR for hyperbolic conservation laws , 2013, PARCO.

[29]  Chao Yang,et al.  10M-Core Scalable Fully-Implicit Solver for Nonhydrostatic Atmospheric Dynamics , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[30]  C. Kühnlein,et al.  The modelling infrastructure of the Integrated Forecasting System : Recent advances and future challenges , 2015 .