An efficient wavefront parallel algorithm for structured three dimensional LU-SGS

Abstract Parallel computing is a useful technology for scientific and engineering algorithms/applications. LU-SGS (lower-upper Symmetric-Gauss–Seidel method) is an efficient and robust scheme for CFD (Computational fluid dynamics) and has strong data dependence in its computation. In this paper, we present an efficient wavefront parallel algorithm for 3D (three dimensional) LU-SGS with structured meshes. The corresponding data structure and memory access method with better data locality and communication optimization is designed. The performances of the presented parallel algorithm are reported with different problem sizes. Some discussion and performance issues are also reported. The results show that the overall performance speedup of one Intel E5540 CPU (4 CPU cores) ranges from 2.23 to 2.95 compared with one E5540 core. The parallel efficiency of 1024, 128 processes are up to 35.68%, 72.69% compared with 32 processes on a distributed memory cluster system. The CFD simulation of M6 wing model shows the effect of the presented parallel algorithm.

[1]  Adolfy Hoisie,et al.  Performance and Scalability Analysis of Teraflop-Scale Parallel Architectures Using Multidimensional Wavefront Applications , 2000, Int. J. High Perform. Comput. Appl..

[2]  Chunye Gong,et al.  A parallel algorithm for the Riesz fractional reaction-diffusion equation with explicit finite difference method , 2013 .

[3]  Alex Povitsky,et al.  Parallelization of Pipelined Algorithms for Sets of Linear Banded Systems , 1999, J. Parallel Distributed Comput..

[4]  Yinnian He,et al.  A comparison of three kinds of local and parallel finite element algorithms based on two-grid discretizations for the stationary Navier–Stokes equations , 2011 .

[5]  Xian Liang,et al.  Direct numerical simulation of compressible turbulent flows , 2010 .

[6]  William Gropp,et al.  High-performance parallel implicit CFD , 2001, Parallel Comput..

[7]  Chunye Gong,et al.  An efficient parallel solution for Caputo fractional reaction–diffusion equation , 2014, The Journal of Supercomputing.

[8]  Kenli Li,et al.  Parallel computation of Entropic Lattice Boltzmann method on hybrid CPU–GPU accelerated system , 2015 .

[9]  Wang Feng,et al.  Programming for scientific computing on peta-scale heterogeneous parallel systems , 2013 .

[10]  Bernhard Eisfeld,et al.  ONERA M6 wing , 2006 .

[11]  Julien Bohbot,et al.  A high efficiency parallel unstructured solver dedicated to internal combustion engine simulation , 2011 .

[12]  Canqun Yang,et al.  Fast weighting method for plasma PIC simulation on GPU-accelerated heterogeneous systems , 2013 .

[13]  Tarek S. Abdelrahman,et al.  Exploiting Wavefront Parallelism on Large-Scale Shared-Memory Multiprocessors , 2001, IEEE Trans. Parallel Distributed Syst..

[14]  Onkar Sahni,et al.  A parallel adaptive mesh method for the numerical simulation of multiphase flows , 2013 .

[15]  R. Dwight Efficiency Improvements of RANS-Based Analysis and Optimization using Implicit and Adjoint Methods on Unstructured Grids , 2006 .

[16]  A. Jameson,et al.  Lower-upper Symmetric-Gauss-Seidel method for the Euler and Navier-Stokes equations , 1988 .

[17]  Niclas Jansson,et al.  Unicorn Parallel adaptive finite element simulation of turbulent flow and fluid-structure interaction for deforming domains and complex geometry , 2013 .

[18]  Lu Lin Research on Multi-Dimensional Pipeline Parallel Solution of All-Correlative Block Recursive Equations for Data-Irregular Problems , 2006 .

[19]  Jason M. Reese,et al.  A parallel compact-TVD method for compressible fluid dynamics employing shared and distributed-memory paradigms , 2011 .

[20]  Rainald Löhner,et al.  PARALLEL UNSTRUCTURED GRID GMRES+LU-SGS METHOD FOR TURBULENT FLOWS , 2003 .

[21]  Yanwen Ma,et al.  Direct numerical simulation of hypersonic boundary layer transition over a blunt cone with a small angle of attack , 2010 .

[22]  Chunye Gong,et al.  A Domain Decomposition Method for Time Fractional Reaction-Diffusion Equation , 2014, TheScientificWorldJournal.

[23]  Yi Jiang,et al.  Collaborating CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer , 2014, J. Comput. Phys..

[24]  Xiaolin Cao,et al.  JASMIN: a parallel software infrastructure for scientific computing , 2010, Frontiers of Computer Science in China.

[25]  Wei Liu,et al.  Efficient parallel implementation of large scale 3D structured grid CFD applications on the Tianhe-1A supercomputer , 2013 .

[26]  Haowei Huang,et al.  Particle transport with unstructured grid on GPU , 2012, Comput. Phys. Commun..

[27]  A. Gorobets,et al.  A parallel MPI + OpenMP + OpenCL algorithm for hybrid supercomputations of incompressible flows , 2013 .

[28]  Jie Shen,et al.  A GPU parallelized spectral method for elliptic equations in rectangular domains , 2013, J. Comput. Phys..

[29]  Haowei Huang,et al.  GPU accelerated simulations of 3D deterministic particle transport using discrete ordinates method , 2011, J. Comput. Phys..

[30]  Tao Tang,et al.  Exploiting hierarchy parallelism for molecular dynamics on a petascale heterogeneous system , 2013, J. Parallel Distributed Comput..