A high-throughput hybrid task and data parallel Poisson solver for large-scale simulations of incompressible turbulent flows on distributed GPUs

Abstract The solution of the pressure Poisson equation arising in the numerical solution of incompressible Navier–Stokes equations (INSE) is by far the most expensive part of the computational procedure, and often the major restricting factor for parallel implementations. Improvements in iterative linear solvers, e.g. deploying Krylov-based techniques and multigrid preconditioners, have been successfully applied for solving the INSE on CPU-based parallel computers. These numerical schemes, however, do not necessarily perform well on GPUs, mainly due to differences in the hardware architecture. Our previous work using many P100 GPUs of a flagship supercomputer showed that porting a highly optimized MPI-parallel CPU-based INSE solver to GPUs, accelerates significantly the underlying numerical algorithms, while the overall acceleration remains limited (Zolfaghari et al., Comput. Phys. Commun., 244, 132-142, 2019). The performance loss was mainly due to the Poisson solver, particularly the V-cycle geometric multigrid preconditioner. We also observed that the pure compute time for the GPU kernels remained nearly constant as grid size was increased. Motivated by these observations, we present herein an algebraically simpler, yet more advanced parallel implementation for the solution of the Poisson problem on large numbers of distributed GPUs. Data parallelism is achieved by using the classical Jacobi method with successive over-relaxation and an optimized iterative driver routine. Task parallelism is enhanced via minimizing GPU-GPU data exchanges as iterations proceed to reduce the communication overhead. The hybrid parallelism results in nearly 300 times less time-to-solution and thus computational cost (measured in node-hours) for the Poisson problem, compared to our best-case scenario CPU-based parallel implementation which uses a preconditioned BiCGstab method. The Poisson solver is then embedded in a flow solver with explicit third-order Runge-Kutta scheme for time-integration, which has been previously ported to GPUs. The flow solver is validated and computationally benchmarked for the transition and decay of the Taylor-Green Vortex at R e = 1600 and the flow around a solid sphere at R e D = 3700 . Good strong scaling is demonstrated for both benchmarks. Further, nearly 70% lower electrical energy consumption than the CPU implementation is reported for Taylor-Green vortex case. We finally deploy the solver for DNS of systolic flow in a bileaflet mechanical heart valve, and present new insight into the complex laminar-turbulent transition process in this prosthesis.

[1]  S. L. Borne,et al.  Block computation and representation of a sparse nullspace basis of a rectangular matrix , 2008 .

[2]  Sabine Le Borne,et al.  Hierarchical matrix preconditioners for the Oseen equations , 2008 .

[3]  Nathan J. Quinlan,et al.  Scale-up of an unsteady flow field for enhanced spatial and temporal resolution of PIV measurements: application to leaflet wake flow in a mechanical heart valve , 2011 .

[4]  Per Lötstedt,et al.  High order accurate solution of the incompressible Navier-Stokes equations , 2005 .

[5]  Jinhee Jeong,et al.  On the identification of a vortex , 1995, Journal of Fluid Mechanics.

[6]  Petros Koumoutsakos,et al.  A comparison of vortex and pseudo-spectral methods for the simulation of periodic vortical flows at high Reynolds numbers , 2011, J. Comput. Phys..

[7]  Henk A. van der Vorst,et al.  Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems , 1992, SIAM J. Sci. Comput..

[8]  Dan S. Henningson,et al.  The Fringe Region Technique and the Fourier Method Used in the Direct Numerical Simulation of Spatially Evolving Viscous Flows , 1999, SIAM J. Sci. Comput..

[9]  G. Smith,et al.  Numerical Solution of Partial Differential Equations: Finite Difference Methods , 1978 .

[10]  Fotis Sotiropoulos,et al.  Curvilinear immersed boundary method for simulating fluid structure interaction with complex 3D rigid bodies , 2008, J. Comput. Phys..

[11]  Tim Colonius,et al.  A fast immersed boundary method for external incompressible viscous flows using lattice Green's functions , 2016, J. Comput. Phys..

[12]  M. Muradoglu,et al.  Simulations of viscoelastic two-phase flows in complex geometries , 2017 .

[13]  Elise Cormie-Bowins A Comparison of Sequential and GPU Implementations of Iterative Methods to Compute Reachability Probabilities , 2012, GRAPHITE.

[14]  Hadi Zolfaghari,et al.  Absolute instability of impinging leading edge vortices in a submodel of a bileaflet mechanical heart valve , 2019 .

[15]  H. J. Kim,et al.  Observations of the frequencies in a sphere wake and of drag increase by acoustic excitation , 1988 .

[16]  Charles Hirsch,et al.  Numerical computation of internal and external flows (vol1: Fundamentals of numerical discretization) , 1991 .

[17]  Dominik Obrist,et al.  High-order accurate solution of the incompressible Navier-Stokes equations on massively parallel computers , 2010, Journal of Computational Physics.

[18]  Rajat Mittal,et al.  A versatile sharp interface immersed boundary method for incompressible flows with complex boundaries , 2008, J. Comput. Phys..

[19]  Carlos David Pérez Segarra,et al.  Direct Numerical Simulation of the flow over a sphere at Re = 3700 , 2009 .

[20]  Dominik Obrist,et al.  High-order accurate simulation of incompressible turbulent flows on many parallel GPUs of a hybrid-node supercomputer , 2019, Comput. Phys. Commun..