On the Multi-GPU Computing of a Reconstructed Discontinuous Galerkin Method for Compressible Flows on 3D Hybrid Grids

A multi-GPU accelerated, third-order, reconstructed discontinuous Galerkin method, namely RDG(P1P2), has been developed based on the OpenACC directives for compressible flows on 3D hybrid grids. The present scheme requires minimum intrusion and algorithm alteration to an existing CPU code, which renders an efficient design approach for upgrading a legacy CFD solver with the GPU-computing capability while maintaining its portability across multiple platforms. The grid partitioning is performed according to the number of GPUs, and loaded equally on each GPU. Communication between the GPUs is achieved via the host-based MPI. A face renumbering and grouping algorithm is used to eliminate memory contention due to vectorized computing over the face loops on each individual GPU. A series of inviscid and viscous flow problems have been presented for the verification and scaling test, demonstrating excellent scalability of the resulting GPU code. The numerical results indicate that this parallel RDG(P1P2) method is a cost-effective, high-order DG method for scalable computing on GPU clusters.

[1]  Rainald Löhner,et al.  Porting of an Edge-Based CFD Solver to GPUs , 2010 .

[2]  Inanc Senocak,et al.  An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters , 2010 .

[3]  U. Ghia,et al.  High-Re solutions for incompressible flow using the Navier-Stokes equations and a multigrid method , 1982 .

[4]  Graham Pullan,et al.  Acceleration of a two-dimensional Euler flow solver using commodity graphics hardware , 2007 .

[5]  Hong Luo,et al.  A set of parallel, implicit methods for a reconstructed discontinuous Galerkin method for compressible flows on 3D hybrid grids , 2014 .

[6]  Stefan Turek,et al.  GPU acceleration of an unmodified parallel finite element Navier-Stokes solver , 2009, 2009 International Conference on High Performance Computing & Simulation.

[7]  Chi-Wang Shu,et al.  The Runge-Kutta Discontinuous Galerkin Method for Conservation Laws V , 1998 .

[8]  Timothy C. Warburton,et al.  Nodal discontinuous Galerkin methods on graphics processors , 2009, J. Comput. Phys..

[9]  Kyriakos C. Giannakoglou,et al.  Unsteady CFD computations using vertex‐centered finite volumes for unstructured grids on Graphics Processing Units , 2011 .

[10]  Rainald Löhner,et al.  Running unstructured grid‐based CFD solvers on modern graphics hardware , 2011 .

[11]  Dimitri Komatitsch,et al.  Accelerating a three-dimensional finite-difference wave propagation code using GPU graphics cards , 2010 .

[12]  Graham Pullan,et al.  Acceleration of a 3D Euler solver using commodity graphics hardware , 2008 .

[13]  Piyush Mehrotra,et al.  Using Compiler Directives for Accelerating CFD Applications on GPUs , 2012, IWOMP.

[14]  Tong Liu,et al.  The development of Mellanox/NVIDIA GPUDirect over InfiniBand—a new model for GPU to GPU communications , 2011, Computer Science - Research and Development.

[15]  Hong Luo,et al.  An implicit Hermite WENO reconstruction-based discontinuous Galerkin method on tetrahedral grids , 2014 .

[16]  Rainald Löhner,et al.  Semi‐automatic porting of a large‐scale Fortran CFD code to GPUs , 2012 .

[17]  Hong Luo,et al.  A Hermite WENO reconstruction-based discontinuous Galerkin method for the Euler equations on tetrahedral grids , 2012, J. Comput. Phys..

[18]  Christian Terboven,et al.  OpenACC - First Experiences with Real-World Applications , 2012, Euro-Par.

[19]  Chi-Wang Shu,et al.  The Runge-Kutta local projection discontinuous Galerkin finite element method for conservation laws. IV. The multidimensional case , 1990 .

[20]  Roger L. Davis,et al.  Rapid Aerodynamic Performance Prediction on a Cluster of Graphics Processing Units , 2009 .

[21]  Sayantan Sur,et al.  MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters , 2011, Computer Science - Research and Development.

[22]  Marco Luciano Savini,et al.  Discontinuous Galerkin solution of the Reynolds-averaged Navier–Stokes and k–ω turbulence model equations , 2005 .

[23]  Inanc Senocak,et al.  CUDA Implementation of a Navier-Stokes Solver on Multi-GPU Desktop Platforms for Incompressible Flows , 2009 .

[24]  Hong Luo,et al.  A Reconstructed Discontinuous Galerkin Method Based on a Hierarchical Hermite WENO Reconstruction for Compressible Flows on Tetrahedral Grids , 2012 .

[25]  Dennis C. Jespersen Acceleration of a CFD code with a GPU , 2010 .

[26]  Eric Darve,et al.  Large calculation of the flow over a hypersonic vehicle using a GPU , 2008, J. Comput. Phys..

[27]  Michael A. Leschziner,et al.  Average-State Jacobians and Implicit Methods for Compressible Viscous and Turbulent Flows , 1997 .