Graphics processing unit optimizations for the dynamics of the HIRLAM weather forecast model

Programmable graphics processing units (GPUs) nowadays offer tremendous computational resources for diverse applications. In this paper, we present the implementation of the dynamics routine of the HIRLAM weather forecast model on the NVIDIA GTX 480. The original Fortran code has been converted manually to C and CUDA. Empirically, it is determined what the optimal number of grid points per thread is, and what the best thread and block structures are. A significant amount of the elapsed time consists of transferring data between CPU and GPU. To reduce the impact of these transfer costs, we overlap calculation and transfer of data using multiple CUDA streams. We developed an algorithm that enables our code generator CTADEL to generate automatically the optimal CUDA streams program. Experiments are performed to find out if the applicability of GPUs is useful for Numerical Weather Prediction, in particular for the dynamics part. Copyright © 2012 John Wiley & Sons, Ltd.

[1]  F. Nebeker Calculating the weather : meteorology in the 20th century , 1997 .

[2]  Lex Wolters,et al.  Tomorrow's weather forecast: automatic code generation for atmospheric modeling , 1997 .

[3]  Robert van Engelen,et al.  CTADEL: A Generator of Efficient Numerical Codes , 1998 .

[4]  Jordan G. Powers,et al.  A Description of the Advanced Research WRF Version 2 , 2005 .

[5]  Robert G. Belleman,et al.  High Performance Direct Gravitational N-body Simulations on Graphics Processing Units , 2007, ArXiv.

[6]  Manish Vachharajani,et al.  GPU acceleration of numerical weather prediction , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[7]  Massimiliano Fatica Accelerating linpack with CUDA on heterogenous clusters , 2009, GPGPU-2.

[8]  Eric B. Ford,et al.  Parallel algorithm for solving Kepler’s equation on Graphics Processing Units: Application to analysis of Doppler exoplanet searches , 2008, 0812.2976.

[9]  Lex Wolters,et al.  Overlapping Communications With Calculations , 2009, PDPTA.

[10]  Brett M. Bode,et al.  Performance analysis of memory transfers and GEMM subroutines on NVIDIA Tesla GPU cluster , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[11]  Lex Wolters,et al.  GPU acceleration of the dynamics routine in the HIRLAM weather forecast model , 2010, 2010 International Conference on High Performance Computing & Simulation.

[12]  Rory Kelly,et al.  GPU Computing for Atmospheric Modeling , 2010, Computing in Science & Engineering.

[13]  Wu-chun Feng,et al.  Inter-block GPU communication via fast barrier synchronization , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[14]  M. Januszewski,et al.  Accelerating numerical solution of stochastic differential equations with CUDA , 2009, Comput. Phys. Commun..