Optimizing high-resolution Community Earth System Model on a heterogeneous many-core supercomputing platform

Abstract. With semiconductor technology gradually approaching its physical and thermal limits, recent supercomputers have adopted major architectural changes to continue increasing the performance through more power-efficient heterogeneous many-core systems. Examples include Sunway TaihuLight that has four management processing elements (MPEs) and 256 computing processing elements (CPEs) inside one processor and Summit that has two central processing units (CPUs) and six graphics processing units (GPUs) inside one node. Meanwhile, current high-resolution Earth system models that desperately require more computing power generally consist of millions of lines of legacy code developed for traditional homogeneous multicore processors and cannot automatically benefit from the advancement of supercomputer hardware. As a result, refactoring and optimizing the legacy models for new architectures become key challenges along the road of taking advantage of greener and faster supercomputers, providing better support for the global climate research community and contributing to the long-lasting societal task of addressing long-term climate change. This article reports the efforts of a large group in the International Laboratory for High-Resolution Earth System Prediction (iHESP) that was established by the cooperation of Qingdao Pilot National Laboratory for Marine Science and Technology (QNLM), Texas A&M University (TAMU), and the National Center for Atmospheric Research (NCAR), with the goal of enabling highly efficient simulations of the high-resolution (25 km atmosphere and 10 km ocean) Community Earth System Model (CESM-HR) on Sunway TaihuLight. The refactoring and optimizing efforts have improved the simulation speed of CESM-HR from 1 SYPD (simulation years per day) to 3.4 SYPD (with output disabled) and supported several hundred years of pre-industrial control simulations. With further strategies on deeper refactoring and optimizing for remaining computing hotspots, as well as redesigning architecture-oriented algorithms, we expect an equivalent or even better efficiency to be gained on the new platform than traditional homogeneous CPU platforms. The refactoring and optimizing processes detailed in this paper on the Sunway system should have implications for similar efforts on other heterogeneous many-core systems such as GPU-based high-performance computing (HPC) systems.

[1]  Guangwen Yang,et al.  POM.gpu-v1.0: a GPU-based Princeton Ocean Model , 2015 .

[2]  Bronis R. de Supinski,et al.  The Design, Deployment, and Evaluation of the CORAL Pre-Exascale Systems , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[3]  Dorit Hammerling,et al.  Towards Characterizing the Variability of Statistically Consistent Community Earth System Model Simulations , 2016, ICCS.

[4]  W. Large,et al.  Oceanic vertical mixing: a review and a model with a nonlocal boundary layer parameterization , 1994 .

[5]  G. Meehl,et al.  The Coupled Model Intercomparison Project (CMIP) , 2000 .

[6]  D. Lawrence,et al.  A new synoptic scale resolving global climate simulation using the Community Earth System Model , 2014 .

[7]  Guangwen Yang,et al.  Accelerating the 3D Elastic Wave Forward Modeling on GPU and MIC , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[8]  Peng Wu,et al.  Vectorization for SIMD architectures with alignment constraints , 2004, PLDI '04.

[9]  Jack Dongarra,et al.  Sunway TaihuLight supercomputer makes its appearance , 2016 .

[10]  Elizabeth R. Jessup,et al.  Making Root Cause Analysis Feasible for Large Code Bases: A Solution Approach for a Climate Model , 2018, HPDC.

[11]  Torsten Hoefler,et al.  Near-global climate simulation at 1 km resolution: establishing a performance baseline on 4888 GPUs with COSMO 5.0 , 2017 .

[12]  Jun Wang,et al.  Compute unified device architecture (CUDA)-based parallelization of WRF Kessler cloud microphysics scheme , 2013, Comput. Geosci..

[13]  Wei Ge,et al.  The Sunway TaihuLight supercomputer: system and applications , 2016, Science China Information Sciences.

[14]  Weiguo Liu,et al.  Redesigning CAM-SE for Peta-Scale Climate Modeling Performance and Ultra-High Resolution on Sunway TaihuLight , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[15]  Uday Bondhugula,et al.  Tiling stencil computations to maximize parallelism , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[16]  Chao Yang,et al.  Scaling and analyzing the stencil performance on multi-core and many-core architectures , 2014, 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS).

[17]  Sheri Mickelson,et al.  A new ensemble-based consistency test for the Community Earth System Model (pyCECT v1.0) , 2015 .

[18]  Naixue Xiong,et al.  GPU Acceleration for GRAPES Meteorological Model , 2011, 2011 IEEE International Conference on High Performance Computing and Communications.

[19]  W. G. Strand,et al.  Effects of Model Resolution, Physics, and Coupling on Southern Hemisphere Storm Tracks in CESM1.3 , 2019, Geophysical Research Letters.

[20]  Canqun Yang,et al.  MilkyWay-2 supercomputer: system and application , 2014, Frontiers of Computer Science.

[21]  Guangwen Yang,et al.  Improving the scalability of the ocean barotropic solver in the community earth system model , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[22]  Michael F. Wehner,et al.  The Benefits of Global High Resolution for Climate Simulation: Process Understanding and the Enabling of Stakeholder Decisions at the Regional Scale , 2018, Bulletin of the American Meteorological Society.

[23]  Mark A. Taylor,et al.  Progress towards accelerating HOMME on hybrid multi-core systems , 2013, Int. J. High Perform. Comput. Appl..

[24]  Adrian Sandu,et al.  Multi-core acceleration of chemical kinetics for simulation and prediction , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[25]  Elizabeth R. Jessup,et al.  Nine time steps: ultra-fast statistical consistency testing of the Community Earth System Model (pyCECT v3.0) , 2017 .

[26]  Guangwen Yang,et al.  P-CSI v1.0, an accelerated barotropic solver for the high-resolution ocean model component in the Community Earth System Model v2.0 , 2016 .

[27]  Jing Sun,et al.  GPU acceleration of the WSM6 cloud microphysics scheme in GRAPES model , 2013, Comput. Geosci..

[28]  Peter Lynch,et al.  The ENIAC Forecasts: A Re-creation , 2008 .

[29]  Naga K. Govindaraju,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007 .

[30]  Wenguang Chen,et al.  Refactoring and Optimizing the Community Atmosphere Model (CAM) on the Sunway TaihuLight Supercomputer , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[31]  Lex Wolters,et al.  Graphics processing unit optimizations for the dynamics of the HIRLAM weather forecast model , 2013, Concurr. Comput. Pract. Exp..

[32]  Tobias Gysi,et al.  STELLA: a domain-specific tool for structured grid methods in weather and climate models , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[33]  W. Collins,et al.  The Community Climate System Model Version 3 (CCSM3) , 2006 .

[34]  Rory Kelly,et al.  GPU Computing for Atmospheric Modeling , 2010, Computing in Science & Engineering.

[35]  Ning Wang,et al.  Parallelization and Performance of the NIM Weather Model on CPU, GPU, and MIC Processors , 2017 .

[36]  Satoshi Matsuoka,et al.  Multi-GPU Implementation of the NICAM Atmospheric Model , 2012, Euro-Par Workshops.

[37]  Jian Zhang,et al.  Extreme-Scale Phase Field Simulations of Coarsening Dynamics on the Sunway TaihuLight Supercomputer , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[38]  G. Vecchi,et al.  Simulated Climate and Climate Change in the GFDL CM2.5 High-Resolution Coupled Climate Model , 2012 .