GPU Port of A Parallel Incompressible Navier-Stokes Solver based on OpenACC and MVAPICH2

OpenACC is a directive-based programing standard aim to provide a highly portable programming model for massively-parallel accelerators, such as General-purpose Computing on Graphics Processing Units (GPGPU), Accelerated Processing Unit (APU) and Many Integrated Core Architecture (MIC). The heterogeneous nature of these accelerators stresses a demand for careful planning of data movement and novel approaches of parallel algorithms not commonly involved in scientific computation. By following a similar concept of OpenMP, the directive-based approach of OpenACC hides many underlying implementation details, thus significantly reduces the programming complexity and increases code portability. However, many challenges remain, due to the relatively narrow interconnection bandwidth among GPUs and the very fine granularity of GPGPU architecture. The first is particularly restrictive when cross-node data exchange is involved in a cluster environment. Furthermore, GPGPU’s fine-grained parallelism is in conflict with certain types of inherently serial algorithms, posing further restrictions on performance. In our study, an implicit multi-block incompressible Navier-Stokes solver is ported for GPGPU using OpenACC and MVAPICH2. A performance analysis is carried out based on the profiling of this solver running in a InfiniBand cluster with nVidia GPUs, which helps to identify the potentials of directive-based GPU programming and directions for further improvement.

[1]  Jack R. Edwards,et al.  OpenACC-based GPU Acceleration of a 3-D Unstructured Discontinuous Galerkin Method , 2014 .

[2]  A. Chorin Numerical solution of the Navier-Stokes equations , 1968 .

[3]  Jie Li,et al.  Improving the Network Lifetime of MANETs through Cooperative MAC Protocol Design , 2015, IEEE Transactions on Parallel and Distributed Systems.

[4]  Dhabaleswar K. Panda,et al.  GPU-Aware MPI on RDMA-Enabled Clusters: Design, Implementation and Evaluation , 2014, IEEE Transactions on Parallel and Distributed Systems.

[5]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[6]  J. Edwards,et al.  Large-eddy simulation of human-induced contaminant transport in room compartments. , 2012, Indoor air.

[7]  Jung Il Choi,et al.  An immersed boundary method for complex incompressible flows , 2007, J. Comput. Phys..

[8]  J. Edwards,et al.  Large eddy simulation and zonal modeling of human-induced contaminant transport. , 2008, Indoor air.

[9]  Graham Pullan,et al.  Acceleration of a 3D Euler solver using commodity graphics hardware , 2008 .

[10]  J. Edwards,et al.  Low-Diffusion Flux-Splitting Methods for Flows at All Speeds , 1997 .

[11]  Rainald Löhner,et al.  Semi‐automatic porting of a large‐scale Fortran CFD code to GPUs , 2012 .

[12]  J. Edwards,et al.  An unsteady airfoil theory applied to pitching motions validated against experiment and computation , 2013 .

[13]  Inanc Senocak,et al.  CUDA Implementation of a Navier-Stokes Solver on Multi-GPU Desktop Platforms for Incompressible Flows , 2009 .

[14]  Jack R. Edwards,et al.  An investigation of interface-sharpening schemes for multi-phase mixture flows , 2009, J. Comput. Phys..

[15]  Inanc Senocak,et al.  An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters , 2010 .

[16]  F. Mueller,et al.  Performance Assessment of A Multi-block Incompressible Navier-Stokes Solver using Directive-based GPU Programming in a Cluster Environment , 2013 .

[17]  Roger L. Davis,et al.  Rapid Aerodynamic Performance Prediction on a Cluster of Graphics Processing Units , 2009 .

[18]  D. K. Panda InfiniBand Architecture , 2001 .

[19]  Dhabaleswar K. Panda,et al.  High Performance RDMA-Based MPI Implementation over InfiniBand , 2003, ICS '03.

[20]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[21]  J. Edwards,et al.  Investigations of Lift-Based Pitch-Plunge Equivalence for Airfoils at Low Reynolds Numbers , 2011 .

[22]  Stefan Turek,et al.  GPU acceleration of an unmodified parallel finite element Navier-Stokes solver , 2009, 2009 International Conference on High Performance Computing & Simulation.

[23]  Nagiza F. Samatova,et al.  Enabling Fast, Noncontiguous GPU Data Movement in Hybrid MPI+GPU Environments , 2012, 2012 IEEE International Conference on Cluster Computing.