Parallelizing a high-order WENO scheme for complicated flow structures on GPU and MIC

As a conservative, high-order accurate, shock-capturing method, weighted essentially non-oscillatory (WENO) scheme have been widely used to effectively resolve complicated flow structures in computational fluid dynamics (CFD) simulations. However, using a high-order WENO scheme can be highly time-consuming, which greatly limits the CFD application's performance efficiency. In this paper, we present various parallel strategies base on the latest many-core platform such as NVIDIA Fermi GPU, NVIDIA Kepler GPU and Intel MIC coprocessor to accelerate a high-order WENO scheme. Comparison analysis of the two generations GPUs between Fermi and Kepler, and cross-platform performance analysis (focusing on Kepler GPU and MIC) are also detailed discussed. The experiments show that the Kepler GPU offers a clear advantage in contrast to the previous Fermi GPU maintaining exactly the same source code. Furthermore, while Kepler GPU can be several times faster than MIC without utilizing the increasingly available SIMD computing power on Vector Processing Unit (VPU), MIC can provide the computing capability equivalent to Kepler GPU when VPU is utilized. Our implementations and optimization techniques can serve as case studies for paralleling high-order schemes on many-core architectures.

[1]  Vahid Esfahanian,et al.  Assessment of WENO schemes for numerical simulation of some hyperbolic equations using GPU , 2013 .

[2]  Long Wang,et al.  Acceleration of a High Order Finite-Difference WENO Scheme for Large-Scale Cosmological Simulations on GPU , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[3]  Diego Rossinelli,et al.  Multicore/Multi-GPU Accelerated Simulations of Multiphase Compressible Flows Using Wavelet Adapted Grids , 2011, SIAM J. Sci. Comput..

[4]  Jun Kong,et al.  Application performance analysis and efficient execution on systems with multi-core CPUs, GPUs and MICs: a case study with microscopy image analysis , 2017, Int. J. High Perform. Comput. Appl..

[5]  Konstantinos I. Karantasis,et al.  High order accurate simulation of compressible flows on GPU clusters over Software Distributed Shared Memory , 2014 .

[6]  S. Osher,et al.  Efficient implementation of essentially non-oscillatory shock-capturing schemes,II , 1989 .

[7]  Xinmin Tian,et al.  Effective SIMD Vectorization for Intel Xeon Phi Coprocessors , 2015, Sci. Program..

[8]  V. Gregory Weirs,et al.  A bandwidth-optimized WENO scheme for the effective direct numerical simulation of compressible turbulence , 2006, J. Comput. Phys..

[9]  Konstantinos I. Karantasis,et al.  Acceleration of a Finite-Difference WENO Scheme for Large-Scale Simulations on Many-Core Architectures , 2010 .

[10]  James Reinders,et al.  Intel Xeon Phi Coprocessor High Performance Programming , 2013 .

[11]  Michael Griebel,et al.  A multi-GPU accelerated solver for the three-dimensional two-phase incompressible Navier-Stokes equations , 2010, Computer Science - Research and Development.

[12]  Chi-Wang Shu,et al.  Monotonicity Preserving Weighted Essentially Non-oscillatory Schemes with Increasingly High Order of Accuracy , 2000 .

[13]  Chi-Wang Shu,et al.  Efficient Implementation of Weighted ENO Schemes , 1995 .

[14]  Jun Kong,et al.  Performance Analysis and Efficient Execution on Systems with multi-core CPUs, GPUs and MICs , 2015, ArXiv.

[15]  Z. Wang High-order methods for the Euler and Navier–Stokes equations on unstructured grids , 2007 .

[16]  Pino Martin,et al.  Assessment of WENO Methods with Shock-Confining Filtering for LES of Compressible Turbulence , 2007 .

[17]  P. Woodward,et al.  The numerical simulation of two-dimensional fluid flow with strong shocks , 1984 .

[18]  Vahid Esfahanian,et al.  Assessment of WENO schemes for multi‐dimensional Euler equations using GPU , 2014 .

[19]  Lin Fu,et al.  A multi-block viscous flow solver based on GPU parallel methodology , 2014 .