Acceleration of Turbomachinery Steady Simulations on GPU

Steady state simulations in Computational Fluid Dynamics (CFD), which rely on implicit time integration, are not experiencing great accelerations on GPUs. Moreover, most of the reported acceleration effort concerns solving the linear system of equations while neglecting the acceleration potential of running the entire simulation on the GPU. In this paper, we present the software implementation of an implicit RANS CFD solver, which is fully running on GPU. We use the GMRES linear solver of the Paralution package combined with the incomplete LU factorization for the preconditioning. We propose also a control mechanism - on-demand factorization - capable of reducing the number of times an incomplete LU factorization is performed. The on-demand factorization accelerates the linear solver without altering the flow convergence. The GPU implementation achieved a speedups of 9.2x compared to a single-core CPU and 3.5x compared to a 4-cores CPU for 3-D flow predictions in turbine applications.

[1]  Matthew G. Knepley,et al.  PETSc Users Manual (Rev. 3.4) , 2014 .

[2]  Yao Zhang,et al.  Parallel Computing Experiences with CUDA , 2008, IEEE Micro.

[3]  Eric Darve,et al.  Assembly of finite element methods on graphics processors , 2011 .

[4]  I. Reguly,et al.  Efficient sparse matrix-vector multiplication on cache-based GPUs , 2012, 2012 Innovative Parallel Computing (InPar).

[5]  Lin Fu,et al.  A multi-block viscous flow solver based on GPU parallel methodology , 2014 .

[6]  William Gropp,et al.  PETSc Users Manual Revision 3.4 , 2016 .

[7]  Michael Garland,et al.  Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[8]  Pradeep Dubey,et al.  Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.

[9]  Shahrokh Shahpar,et al.  Aerodynamic Optimization of High-Pressure Turbines for Lean-Burn Combustion System , 2013 .

[10]  Y. Saad,et al.  GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems , 1986 .

[11]  Jiri Blazek,et al.  Computational Fluid Dynamics: Principles and Applications , 2001 .

[12]  Jonathan M. Cohen,et al.  Parallel Graph Coloring with Applications to the Incomplete-LU Factorization on the GPU , 2015 .

[13]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[14]  Y. Saad,et al.  Iterative solution of linear systems in the 20th century , 2000 .

[15]  KimChangkyu,et al.  Debunking the 100X GPU vs. CPU myth , 2010 .

[16]  Graham Pullan,et al.  Acceleration of a two-dimensional Euler flow solver using commodity graphics hardware , 2007 .

[17]  Edmond Chow,et al.  Fine-Grained Parallel Incomplete LU Factorization , 2015, SIAM J. Sci. Comput..

[18]  Eric F Darve,et al.  A new sparse matrix vector multiplication graphics processing unit algorithm designed for finite element problems , 2015 .

[19]  T. Arts,et al.  Aero-thermal investigation of a highly loaded transonic linear turbine guide vane cascade: A test case for inviscid and viscous flow computations , 1990 .

[20]  Dimitar Lukarski,et al.  Accelerating COBAYA3 on multi-core CPU and GPU systems using PARALUTION , 2014, ICS 2014.

[21]  Marcus Meyer,et al.  Stabilisation of discrete steady adjoint solvers , 2015, J. Comput. Phys..

[22]  Hiroshi Okuda,et al.  GPU Acceleration for FEM-Based Structural Analysis , 2013 .

[23]  Nathan Bell,et al.  Thrust: A Productivity-Oriented Library for CUDA , 2012 .

[24]  Cornelis Vuik,et al.  Aerodynamic optimization of supersonic compressor cascade using differential evolution on GPU , 2016 .

[25]  Rafael Mayo,et al.  Solving Dense Linear Systems on Graphics Processors , 2008, Euro-Par.