Title: A Fast Double Precision CFD Code using CUDA

We describe a second-order double precision finite volume Boussinesq code implemented using the CUDA platform. We perform detailed validation of the code on a variety of Rayleigh-Benard convection problems and show second order convergence. We obtain matching results with a Fortran code running on a high-end eight-core CPU. The CUDA-accelerated code achieves approximately an eight-time speedup for versus the Fortran code on identical problems. As a result, we are able to run a simulation with a grid of size 384 2 192 at 1.6 seconds per time step on a machine with a single GPU.

[1]  W. H. Leong,et al.  On a physically-realizable benchmark problem in internal natural convection , 1998 .

[2]  Alexander F. Shchepetkin,et al.  The regional oceanic modeling system (ROMS): a split-explicit, free-surface, topography-following-coordinate oceanic model , 2005 .

[3]  Jostein R. Natvig,et al.  Solving the Euler Equations on Graphics Processing Units , 2006, International Conference on Computational Science.

[4]  Stefan Turek,et al.  GPU acceleration of an unmodified parallel finite element Navier-Stokes solver , 2009, 2009 International Conference on High Performance Computing & Simulation.

[5]  Osamu Matsuda,et al.  Onset of 3D thermal convection in a cubic cavity , 1997 .

[6]  Timothy C. Warburton,et al.  Nodal discontinuous Galerkin methods on graphics processors , 2009, J. Comput. Phys..

[7]  M. Jeroen Molemaker,et al.  Balanced and unbalanced routes to dissipation in an equilibrated Eady flow , 2010, Journal of Fluid Mechanics.

[8]  Francesc Giralt,et al.  Bifurcation analysis of steady Rayleigh–Bénard convection in a cubical cavity with conducting sidewalls , 2008, Journal of Fluid Mechanics.

[9]  Robert Strzodka,et al.  Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations , 2007, Int. J. Parallel Emergent Distributed Syst..

[10]  Rainald Löhner,et al.  Running unstructured grid‐based CFD solvers on modern graphics hardware , 2009 .

[11]  Graham Pullan,et al.  Acceleration of a 3D Euler solver using commodity graphics hardware , 2008 .

[12]  Inanc Senocak,et al.  CUDA Implementation of a Navier-Stokes Solver on Multi-GPU Desktop Platforms for Incompressible Flows , 2009 .

[13]  Michael Garland,et al.  Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[14]  William H. Reid,et al.  Some Further Results on the Bénard Problem , 1958 .

[15]  Roger L. Davis,et al.  Rapid Aerodynamic Performance Prediction on a Cluster of Graphics Processing Units , 2009 .

[16]  Eric Darve,et al.  Large calculation of the flow over a hypersonic vehicle using a GPU , 2008, J. Comput. Phys..

[17]  Manish Vachharajani,et al.  GPU acceleration of numerical weather prediction , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[18]  James Demmel,et al.  Benchmarking GPUs to tune dense linear algebra , 2008, HiPC 2008.

[19]  Erik Lindholm,et al.  NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[20]  Greg Humphreys,et al.  A multigrid solver for boundary value problems using programmable graphics hardware , 2003, HWWS '03.

[21]  Irad Yavneh,et al.  On Red-Black SOR Smoothing in Multigrid , 1996, SIAM J. Sci. Comput..