Porting the COSMO Weather Model to Manycore CPUs

Weather and climate simulations are a major application driver in high-performance computing (HPC). With the end of Dennard scaling and Moore's law, the HPC industry increasingly employs specialized computation accelerators to increase computational throughput. Manycore architectures, such as Intel's Knights Landing (KNL), are a representative example of future processing devices. However, software has to be modified to use these devices efficiently. In this work, we demonstrate how an existing domain-specific language that has been designed for CPUs and GPUs can be extended to Manycore architectures such as KNL. We achieve comparable performance to the NVIDIA Tesla P100 GPU architecture on hand-tuned representative stencils of the dynamical core of the COSMO weather model and its radiation code. Further, we present performance within a factor of two of the P100 of the full DSL-based GPU-optimized COSMO dycore code. We find that optimizing code to full performance on modern manycore architectures requires similar effort and hardware knowledge as for GPUs. Further, we show limitations of the present approaches, and outline our lessons learned and possible principles for design of future DSLs for accelerators in the weather and climate domain.

[1]  Daniel Sunderland,et al.  Manycore performance-portability: Kokkos multidimensional array library , 2012, Sci. Program..

[2]  Elizabeth R. Jessup,et al.  Optimizing Weather Model Radiative Transfer Physics for Intel's Many Integrated Core (MIC) Architecture , 2016, Parallel Process. Lett..

[3]  Eike Hermann Müller,et al.  LFRic: Meeting the challenges of scalability and performance portability in Weather and Climate models , 2018, J. Parallel Distributed Comput..

[4]  Jim Jeffers,et al.  Knights Landing overview , 2016 .

[5]  Tobias Gysi,et al.  Towards a performance portable, architecture agnostic implementation strategy for weather and climate models , 2014, Supercomput. Front. Innov..

[6]  B. Ritter,et al.  A comprehensive radiation scheme for numerical weather prediction models with potential applications in climate simulations , 1992 .

[7]  Jarno Mielikainen,et al.  Intel Xeon Phi accelerated Weather Research and Forecasting (WRF) Goddard microphysics scheme , 2014 .

[8]  Robert Pincus,et al.  The CLAW DSL: Abstractions for Performance Portable Weather and Climate Models , 2018, PASC.

[9]  Torsten Hoefler,et al.  Reflecting on the Goal and Baseline for Exascale Computing: A Roadmap Based on Weather and Climate Simulations , 2019, Computing in Science & Engineering.

[10]  Ning Wang,et al.  Parallelization and Performance of the NIM Weather Model on CPU, GPU, and MIC Processors , 2017 .

[11]  Sabela Ramos,et al.  Capability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNL , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[12]  Tobias Gysi,et al.  STELLA: a domain-specific tool for structured grid methods in weather and climate models , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[13]  Avinash Sodani,et al.  Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition 2nd Edition , 2016 .

[14]  Jack Dongarra,et al.  TOP500 Sublist for November 2001 , 2001 .

[15]  Louis J. Wicker,et al.  Time-Splitting Methods for Elastic Models Using Forward Time Schemes , 2002 .

[16]  Martin Berzins,et al.  OpenMP 4 Fortran Modernization of WSM6 for KNL , 2017, PEARC.

[17]  Torsten Hoefler,et al.  Polly-ACC Transparent compilation to heterogeneous hardware , 2016, ICS.

[18]  Torsten Hoefler,et al.  Near-global climate simulation at 1 km resolution: establishing a performance baseline on 4888 GPUs with COSMO 5.0 , 2017 .

[19]  Pawel Gepner,et al.  Using Intel Xeon Phi Coprocessor to Accelerate Computations in MPDATA Algorithm , 2013, PPAM.