论文信息 - Title: A Fast Double Precision CFD Code using CUDA

Title: A Fast Double Precision CFD Code using CUDA

We describe a second-order double precision finite volume Boussinesq code implemented using the CUDA platform. We perform detailed validation of the code on a variety of Rayleigh-Benard convection problems and show second order convergence. We obtain matching results with a Fortran code running on a high-end eight-core CPU. The CUDA-accelerated code achieves approximately an eight-time speedup for versus the Fortran code on identical problems. As a result, we are able to run a simulation with a grid of size 384 2 192 at 1.6 seconds per time step on a machine with a single GPU.

Jonathan Cohen | M. Jeroen Molemaker | Jonathan M. Cohen | M. Molemaker

[1] W. H. Leong,et al. On a physically-realizable benchmark problem in internal natural convection , 1998 .

[2] Alexander F. Shchepetkin,et al. The regional oceanic modeling system (ROMS): a split-explicit, free-surface, topography-following-coordinate oceanic model , 2005 .

[3] Jostein R. Natvig,et al. Solving the Euler Equations on Graphics Processing Units , 2006, International Conference on Computational Science.

[4] Stefan Turek,et al. GPU acceleration of an unmodified parallel finite element Navier-Stokes solver , 2009, 2009 International Conference on High Performance Computing & Simulation.

[5] Osamu Matsuda,et al. Onset of 3D thermal convection in a cubic cavity , 1997 .

[6] Timothy C. Warburton,et al. Nodal discontinuous Galerkin methods on graphics processors , 2009, J. Comput. Phys..

[7] M. Jeroen Molemaker,et al. Balanced and unbalanced routes to dissipation in an equilibrated Eady flow , 2010, Journal of Fluid Mechanics.

[8] Francesc Giralt,et al. Bifurcation analysis of steady Rayleigh–Bénard convection in a cubical cavity with conducting sidewalls , 2008, Journal of Fluid Mechanics.

[9] Robert Strzodka,et al. Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations , 2007, Int. J. Parallel Emergent Distributed Syst..

[10] Rainald Löhner,et al. Running unstructured grid‐based CFD solvers on modern graphics hardware , 2009 .

[11] Graham Pullan,et al. Acceleration of a 3D Euler solver using commodity graphics hardware , 2008 .

[12] Inanc Senocak,et al. CUDA Implementation of a Navier-Stokes Solver on Multi-GPU Desktop Platforms for Incompressible Flows , 2009 .

[13] Michael Garland,et al. Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[14] William H. Reid,et al. Some Further Results on the Bénard Problem , 1958 .

[15] Roger L. Davis,et al. Rapid Aerodynamic Performance Prediction on a Cluster of Graphics Processing Units , 2009 .

[16] Eric Darve,et al. Large calculation of the flow over a hypersonic vehicle using a GPU , 2008, J. Comput. Phys..

[17] Manish Vachharajani,et al. GPU acceleration of numerical weather prediction , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[18] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, HiPC 2008.

[19] Erik Lindholm,et al. NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[20] Greg Humphreys,et al. A multigrid solver for boundary value problems using programmable graphics hardware , 2003, HWWS '03.

[21] Irad Yavneh,et al. On Red-Black SOR Smoothing in Multigrid , 1996, SIAM J. Sci. Comput..