A heterogeneous parallel Red–Black SOR technique and the numerical study on SIMPLE

A basic heterogeneous parallel Red–Black successive over-relaxation (SOR) implement, the mono-color floating-point scheme, was developed on graphics processing units (GPU) with OpenCL platform. Designed in fine granularity, compact data structure, and stencil function, a concise mapping relationship was created to implicitly describe the complex rules for searching neighbor elements, which could avoid low utilization of GPU in the traditional scheme of Red–Black SOR. The new mono-color floating-point scheme was applied to build fast Semi-Implicit Method for Pressure Linked Equations (SIMPLE) solver with OpenCL and OpenMP on the heterogeneous parallel computing device. Compared with SIMPLE solver in the traditional Red–Black SOR scheme, the new scheme can achieve 1.7 to 1.8 faster accelerative performance on the same GPU. And this scheme can eliminate the complex searching module in mono-color logical scheme and behave better than the mono-color logical scheme by 20–30% acceleration. Numerical cases in double precision showed that SIMPLE solver on GPU with new scheme of Red–Black SOR could save up to 92% computing time compared with the serial solver on CPU.

[1]  Ying Zhao,et al.  A GPU Accelerated Red-Black SOR Algorithm for Computational Fluid Dynamics Problems , 2011 .

[2]  Yao Zhang,et al.  Fast tridiagonal solvers on the GPU , 2010, PPoPP '10.

[3]  Kyle E. Niemeyer,et al.  Recent progress and challenges in exploiting graphics processors in computational fluid dynamics , 2013, The Journal of Supercomputing.

[4]  Yong Yin,et al.  3D Parallel Multigrid Methods for Real-Time Fluid Simulation , 2018 .

[5]  U. Ghia,et al.  High-Re solutions for incompressible flow using the Navier-Stokes equations and a multigrid method , 1982 .

[6]  Katja Bachmeier,et al.  Numerical Heat Transfer And Fluid Flow , 2016 .

[7]  Volodymyr Kindratenko,et al.  Numerical Computations with GPUs , 2014, Springer International Publishing.

[8]  Jie Cheng,et al.  CUDA by Example: An Introduction to General-Purpose GPU Programming , 2010, Scalable Comput. Pract. Exp..

[9]  G. D. Davis Natural convection of air in a square cavity: A bench mark numerical solution , 1983 .

[10]  F. Moldoveanu,et al.  GPU optimized computation of stencil based algorithms , 2011, 2011 RoEduNet International Conference 10th Edition: Networking in Education and Research.

[11]  Manfred Liebmann,et al.  Velocity–pressure coupling on GPUs , 2012, Computing.

[12]  Yannis Cotronis,et al.  Graphics processing unit acceleration of the red/black SOR method , 2013, Concurr. Comput. Pract. Exp..

[13]  Inanc Senocak,et al.  CUDA Implementation of a Navier-Stokes Solver on Multi-GPU Desktop Platforms for Incompressible Flows , 2009 .

[14]  Hua Shan,et al.  An Accelerated Iterative Linear Solver with GPUs for CFD Calculations of Unstructured Grids , 2016, ICCS.

[15]  Surya Pratap Vanka,et al.  Implementation of a Semi-Implicit Pressure-Based Multigrid Fluid Flow Algorithm on a Graphics Processing Unit , 2009 .

[16]  Z. Strakos,et al.  Krylov Subspace Methods: Principles and Analysis , 2012 .

[17]  Sanguthevar Rajasekaran,et al.  Fast GPU algorithms for implementing the red-black Gauss-Seidel method for Solving Partial Differential Equations , 2013, 2013 IEEE Symposium on Computers and Communications (ISCC).

[18]  George N. Barakos,et al.  Natural convection flow in a square cavity revisited: Laminar and turbulent models with wall functions , 1994 .

[19]  G. Karniadakis,et al.  Spectral/hp Element Methods for Computational Fluid Dynamics , 2005 .

[20]  Pratanu Roy,et al.  A Parallel Multigrid Finite-Volume Solver on a Collocated Grid for Incompressible Navier-Stokes Equations , 2015 .

[21]  S. Pratap Vanka,et al.  COMPUTATIONAL FLUID DYNAMICS USING GRAPHICS PROCESSING UNITS: CHALLENGES AND OPPORTUNITIES , 2011 .

[22]  Jason Sanders,et al.  CUDA by example: an introduction to general purpose GPU programming , 2010 .

[23]  Cosmin Nita,et al.  Optimized three-dimensional stencil computation on Fermi and Kepler GPUs , 2014, 2014 IEEE High Performance Extreme Computing Conference (HPEC).

[24]  D. Spalding,et al.  A calculation procedure for heat, mass and momentum transfer in three-dimensional parabolic flows , 1972 .

[25]  Bo Yu,et al.  GPU Acceleration of CFD Algorithm: HSMAC and SIMPLE , 2017, ICCS.

[26]  Michael Engel,et al.  Massively parallel Monte Carlo for many-particle simulations on GPUs , 2012, J. Comput. Phys..

[27]  Yannis Cotronis,et al.  A comparison of CPU and GPU implementations for solving the Convection Diffusion equation using the local Modified SOR method , 2014, Parallel Comput..

[28]  Timothy G. Mattson,et al.  OpenCL Programming Guide , 2011 .

[29]  Yunfei Chen,et al.  GPU accelerated molecular dynamics simulation of thermal conductivities , 2007, J. Comput. Phys..

[30]  J. Ortega,et al.  A multi-color SOR method for parallel computation , 1982, ICPP.

[31]  Takayuki Aoki,et al.  Multi-GPU performance of incompressible flow computation by lattice Boltzmann method on GPU cluster , 2011, Parallel Comput..

[32]  Gabriel Usera,et al.  Heterogeneous Computing (CPU-GPU) for Pollution Dispersion in an Urban Environment , 2020, Comput..

[33]  Aoki Takayuki,et al.  Multi-GPU performance of incompressible flow computation by lattice Boltzmann method on GPU cluster , 2011, ParCo 2011.