Acceleration for CFD applications on large GPU clusters: An NPB case study

Computational fluid dynamics (CFD) applications have an ever-growing demand for the power of high performance computing (HPC) infrastructure. Many CFD simulations have benefited from newly-acknowledged GPU clusters. However, few of them have exploited both the CPU and the GPU computational resources within the heterogeneous HPC platforms. In this paper, we endeavor to demonstrate the approach of making large-scale CFD applications benefited from GPU clusters. Taking the NPB as an example, we implement several CFD kernels with our hybrid programming pattern MOC and perform them on the TianHe-1A supercomputer. Experimental results show that: (1) CFD applications can achieve significant performance improvement on GPU clusters, even for the memory-bounded ones like CG; (2) the embarrassingly parallel applications can scale well with the number of compute node; and (3) the overlap of data transfer through the PCI-E bus and kernel execution can greatly increase the performance and scalability of CFD applications.

[1]  Ieee Xiang,et al.  The TianHe-1A Supercomputer: Its Hardware and Software , 2011 .

[2]  Peter Bailey,et al.  Accelerating geoscience and engineering system simulations on graphics hardware , 2009, Comput. Geosci..

[3]  Dimitris Drikakis,et al.  Higher-order CFD and interface tracking methods on highly-Parallel MPI and GPU systems , 2011 .

[4]  B. Rogers,et al.  GPUs, a New Tool of Acceleration in CFD: Efficiency and Reliability on Smoothed Particle Hydrodynamics Methods , 2011, PloS one.

[5]  Rory Kelly,et al.  GPU Computing for Atmospheric Modeling , 2010, Computing in Science & Engineering.

[6]  Inanc Senocak,et al.  An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters , 2010 .

[7]  Satoshi Matsuoka,et al.  An 80-Fold Speedup, 15.0 TFlops Full GPU Acceleration of Non-Hydrostatic Weather Model ASUCA Production Code , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[8]  Michael Garland,et al.  Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[9]  Kyriakos C. Giannakoglou,et al.  Unsteady CFD computations using vertex‐centered finite volumes for unstructured grids on Graphics Processing Units , 2011 .

[10]  Francisco Vázquez,et al.  A new approach for sparse matrix vector product on NVIDIA GPUs , 2011, Concurr. Comput. Pract. Exp..

[11]  Laxmikant V. Kalé,et al.  Scaling Hierarchical N-body Simulations on GPU Clusters , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[12]  Xiaoqian Zhu,et al.  CPU/GPU computing for long-wave radiation physics on large GPU clusters , 2012, Comput. Geosci..