Acceleration of a High Order Finite-Difference WENO Scheme for Large-Scale Cosmological Simulations on GPU

In this work, we present our implementation of a three-dimensional 5th order finite-difference weighted essentially non-oscillatory (WENO) scheme in double precision on CPU/GPU clusters, which targets on large-scale cosmological hydrodynamic flow simulations involving both shocks and complicated smooth solution structures. In the level of MPI parallelization, we subdivided the domain along each of three axial directions. Then on each process, we ported the WENO computation to GPU. This method is memory-bound derived from the calculations of the weights and it becomes a greater challenge for a 3D high order problem in double precision. To make full use of impressive computing power of GPU and avoid its memory limitation, we performed a series of optimizations that are focused on memory accessing mode at all levels. We subjected this code to a number of typical tests for the evaluation of effectiveness and efficiency. Our tests indicate that, in a mono-thread Fortran code reference, the GPU version achieves a 12~19 speed-up and about 19~36 in the computation part. We analyzed the results on both Fermi and Kepler GPUs. We also outlined what is needed to further increase the speed by reducing the time spent on the communications part and other future work.

[1]  Chi-Wang Shu,et al.  Monotonicity Preserving Weighted Essentially Non-oscillatory Schemes with Increasingly High Order of Accuracy , 2000 .

[2]  Liu Mingqin,et al.  Simulation for 2D flows in a rectangular meandering channel , 2011, 2011 International Symposium on Water Resource and Environmental Protection.

[3]  Technology of China,et al.  A Hybrid Cosmological Hydrodynamic/N-Body Code Based on a Weighted Essentially Nonoscillatory Scheme , 2004 .

[4]  Paulius Micikevicius,et al.  3D finite difference computation on GPUs using CUDA , 2009, GPGPU-2.

[5]  Chi-Wang Shu Total-variation-diminishing time discretizations , 1988 .

[6]  Chi-Wang Shu,et al.  Efficient Implementation of Weighted ENO Schemes , 1995 .

[7]  Michael Griebel,et al.  A multi-GPU accelerated solver for the three-dimensional two-phase incompressible Navier-Stokes equations , 2010, Computer Science - Research and Development.

[8]  J. Monaghan,et al.  Fundamental differences between SPH and grid methods , 2006, astro-ph/0610051.

[9]  Graham Pullan,et al.  Acceleration of a 3D Euler solver using commodity graphics hardware , 2008 .

[10]  Konstantinos I. Karantasis,et al.  Acceleration of a Finite-Difference WENO Scheme for Large-Scale Simulations on Many-Core Architectures , 2010 .

[11]  J. Anderson,et al.  Fundamentals of Aerodynamics , 1984 .

[12]  Jairo Panetta,et al.  Accelerating Kirchhoff Migration by CPU and GPU Cooperation , 2009, 2009 21st International Symposium on Computer Architecture and High Performance Computing.