Production Level CFD Code Acceleration for Hybrid Many-Core Architectures

In this work, a novel graphics processing unit (GPU) distributed sharing model for hybrid many-core architectures is introduced and employed in the acceleration of a production-level computational fluid dynamics (CFD) code. The latest generation graphics hardware allows multiple processor cores to simultaneously share a single GPU through concurrent kernel execution. This feature has allowed the NASA FUN3D code to be accelerated in parallel with up to four processor cores sharing a single GPU. For codes to scale and fully use resources on these and the next generation machines, codes will need to employ some type of GPU sharing model, as presented in this work. Findings include the effects of GPU sharing on overall performance. A discussion of the inherent challenges that parallel unstructured CFD codes face in accelerator-based computing environments is included, with considerations for future generation architectures. This work was completed by the author in August 2010, and reflects the analysis and results of the time.

[1]  Rainald Löhner,et al.  Running unstructured grid‐based CFD solvers on modern graphics hardware , 2011 .

[2]  Nail K. Yamaleev,et al.  Discrete Adjoint-Based Design Optimization of Unsteady Turbulent Flows on Dynamic Unstructured Grids , 2009 .

[3]  Dennis C. Jespersen Acceleration of a CFD code with a GPU , 2010 .

[4]  Robert Strzodka,et al.  Using GPUs to improve multigrid solver performance on a cluster , 2008, Int. J. Comput. Sci. Eng..

[5]  P. Roe Approximate Riemann Solvers, Parameter Vectors, and Difference Schemes , 1997 .

[6]  Dimitri J. Mavriplis,et al.  Transonic Drag Prediction on a DLR-F6 Transport Configuration Using Unstructured Grid Solvers , 2004 .

[7]  Stefan Turek,et al.  GPU acceleration of an unmodified parallel finite element Navier-Stokes solver , 2009, 2009 International Conference on High Performance Computing & Simulation.

[8]  Robert Strzodka,et al.  Cyclic Reduction Tridiagonal Solvers on GPUs Applied to Mixed-Precision Multigrid , 2011, IEEE Transactions on Parallel and Distributed Systems.

[9]  P. Spalart A One-Equation Turbulence Model for Aerodynamic Flows , 1992 .

[10]  Vipin Kumar,et al.  Multilevel Algorithms for Multi-Constraint Graph Partitioning , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[11]  Jonathan Cohen,et al.  Title: A Fast Double Precision CFD Code using CUDA , 2009 .

[12]  Inanc Senocak,et al.  CUDA Implementation of a Navier-Stokes Solver on Multi-GPU Desktop Platforms for Incompressible Flows , 2009 .

[13]  Robert Strzodka,et al.  Exploring weak scalability for FEM calculations on a GPU-enhanced cluster , 2007, Parallel Comput..

[14]  Dennis C. Jespersen,et al.  Acceleration of a CFD code with a GPU , 2010, Sci. Program..

[15]  George Karypis,et al.  Multilevel k-way Partitioning Scheme for Irregular Graphs , 1998, J. Parallel Distributed Comput..

[16]  John E. Stone,et al.  GPU clusters for high-performance computing , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[17]  W. K. Anderson,et al.  An implicit upwind algorithm for computing turbulent flows on unstructured grids , 1994 .

[18]  Roger L. Davis,et al.  Rapid Aerodynamic Performance Prediction on a Cluster of Graphics Processing Units , 2009 .

[19]  Eric Darve,et al.  Large calculation of the flow over a hypersonic vehicle using a GPU , 2008, J. Comput. Phys..

[20]  Jan-Renee Carlson,et al.  Turbulent Output-Based Anisotropic Adaptation , 2010 .

[21]  E. Cuthill,et al.  Reducing the bandwidth of sparse symmetric matrices , 1969, ACM '69.

[22]  Inanc Senocak,et al.  An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters , 2010 .

[23]  Dimitri J. Mavriplis,et al.  Transonic drag prediction on a DLR-F6 transport configuration using unstructured grid solvers , 2009 .