Progress towards accelerating HOMME on hybrid multi-core systems

The suitability of a spectral element based dynamical core (HOMME) within the Community Atmospheric Model (CAM) for GPU-based architectures is examined and initial performance results are reported. This work was done within a project to enable CAM to run at high resolution on next-generation, multi-petaflop systems. The dynamical core is the present focus because it dominates the performance profile of our target problem. HOMME enjoys good scalability due to its underlying cubed-sphere mesh with full two-dimensional decomposition and the localization of all computational work within each element. The thread blocking and code changes that allow HOMME to effectively use GPUs are described along with a rewritten vertical remapping scheme, which improves performance on both CPUs and GPUs. Validation of results in the full HOMME model is also described. We demonstrate that the most expensive kernel in the model executes more than three times faster on the GPU than the CPU. These improvements are expected to provide improved efficiency when incorporated into the full model that has been configured for the target problem. Remaining issues affecting performance include optimizing the boundary exchanges for the case of multiple spectral elements being computed on the GPU.

[1]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[2]  Rory Kelly,et al.  GPU Computing for Atmospheric Modeling , 2010, Computing in Science & Engineering.

[3]  A. Simmons,et al.  An Energy and Angular-Momentum Conserving Vertical Finite-Difference Scheme and Hybrid Vertical Coordinates , 1981 .

[4]  Mark A. Taylor,et al.  CAM-SE: A scalable spectral element dynamical core for the Community Atmosphere Model , 2012, Int. J. High Perform. Comput. Appl..

[5]  Mark A. Taylor,et al.  High-Resolution Mesh Convergence Properties and Parallel Efficiency of a Spectral Element Atmospheric Dynamical Core , 2005, Int. J. High Perform. Comput. Appl..

[6]  Mark A. Taylor,et al.  Petascale atmospheric models for the Community Climate System Model: new developments and evaluation of scalable dynamical cores , 2008 .

[7]  Shian‐Jiann Lin A “Vertically Lagrangian” Finite-Volume Dynamical Core for Global Models , 2004 .

[8]  Nigel Wood,et al.  The Parabolic Spline Method (PSM) for conservative transport problems , 2006 .

[9]  Nigel Wood,et al.  A monotonic and positive–definite filter for a Semi‐Lagrangian Inherently Conserving and Efficient (SLICE) scheme , 2005 .

[10]  Tom Henderson,et al.  Running the NIM Next-Generation Weather Model on GPUs , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[11]  G. Danabasoglu,et al.  The Community Climate System Model Version 4 , 2011 .

[12]  Adrian Sandu,et al.  Multi-core acceleration of chemical kinetics for simulation and prediction , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[13]  Manish Vachharajani,et al.  GPU acceleration of numerical weather prediction , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[14]  M. Suárez,et al.  A proposal for the intercomparison of the dynamical cores of atmospheric general circulation models , 1994 .

[15]  David L. Williamson,et al.  The Accumulation of Rounding Errors and Port Validation for Global Atmospheric Models , 1997, SIAM J. Sci. Comput..

[16]  W. Collins,et al.  Description of the NCAR Community Atmosphere Model (CAM 3.0) , 2004 .