GPU-enabled efficient executions of radiation calculations in climate modeling

In this paper, we discuss the acceleration of a climate model known as the Community Earth System Model (CESM). The use of Graphics Processor Units (GPUs) to accelerate scientific applications that are computationally intensive is well known. This work attempts to extract the performance of GPUs to enable faster execution of CESM and obtain better model throughput. We focus on two major routines that consume the largest amount of time namely, radabs and radcswmx, which compute parameters related to the long wave (infra-red) and short wave (visible and ultra-violet) radiations respectively. We propose a novel asynchronous execution strategy in which the results computed by the GPU for the current time step are used by the CPU in the subsequent time step. Such a technique effectively hides computational effort on the GPU. By exploiting the parallelism offered by the GPU and using asynchronous executions on the CPU and GPU, we obtain a speed-up of about 26× for the routine radabs and about 5.6× for routine radcswmx.

[1]  W. Collins,et al.  Description of the NCAR Community Atmosphere Model (CAM 3.0) , 2004 .

[2]  Tom Henderson,et al.  Running the NIM Next-Generation Weather Model on GPUs , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[3]  Laxmikant V. Kalé,et al.  Scaling Hierarchical N-body Simulations on GPU Clusters , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[4]  Jean-François Méhaut,et al.  Density functional theory calculation on many-cores hybrid central processing unit-graphic processing unit architectures. , 2009, The Journal of chemical physics.

[5]  Nathan R. Tallent,et al.  HPCTOOLKIT: tools for performance analysis of optimized parallel programs , 2010, Concurr. Comput. Pract. Exp..

[6]  Manish Vachharajani,et al.  GPU acceleration of numerical weather prediction , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[7]  Sathish S. Vadhiyar,et al.  Efficient asynchronous executions of AMR computations and visualization on a GPU system , 2013, J. Parallel Distributed Comput..

[8]  John E. Stone,et al.  GPU-accelerated computation and interactive display of molecular orbitals , 2010 .

[9]  B. Briegleb Delta‐Eddington approximation for solar radiation in the NCAR community climate model , 1992 .

[10]  Jonathan Cohen,et al.  Title: A Fast Double Precision CFD Code using CUDA , 2009 .

[11]  Satoshi Matsuoka,et al.  An 80-Fold Speedup, 15.0 TFlops Full GPU Acceleration of Non-Hydrostatic Weather Model ASUCA Production Code , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[12]  Pradeep Dubey,et al.  Mapping High-Fidelity Volume Rendering for Medical Imaging to CPU, GPU and Many-Core Architectures , 2009, IEEE Transactions on Visualization and Computer Graphics.

[13]  Jack J. Dongarra,et al.  Enabling and scaling matrix computations on heterogeneous multi-core and multi-GPU systems , 2012, ICS '12.

[14]  Mark A. Taylor,et al.  Progress towards accelerating HOMME on hybrid multi-core systems , 2013, Int. J. High Perform. Comput. Appl..

[15]  Sathish S. Vadhiyar,et al.  Dynamic Component Extension: a Strategy for Performance Improvement in Multicomponent Applications , 2009, Int. J. High Perform. Comput. Appl..