Methods for compressible fluid simulation on GPUs using high-order finite differences

Abstract We focus on implementing and optimizing a sixth-order finite-difference solver for simulating compressible fluids on a GPU using third-order Runge–Kutta integration. Since graphics processing units perform well in data-parallel tasks, this makes them an attractive platform for fluid simulation. However, high-order stencil computation is memory-intensive with respect to both main memory and the caches of the GPU. We present two approaches for simulating compressible fluids using 55-point and 19-point stencils. We seek to reduce the requirements for memory bandwidth and cache size in our methods by using cache blocking and decomposing a latency-bound kernel into several bandwidth-bound kernels. Our fastest implementation is bandwidth-bound and integrates 343 million grid points per second on a Tesla K40t GPU, achieving a 3 . 6 × speedup over a comparable hydrodynamics solver benchmarked on two Intel Xeon E5-2690v3 processors. Our alternative GPU implementation is latency-bound and achieves the rate of 168 million updates per second.

[2]  Sebastian Cygert,et al.  Using GPUs for Parallel Stencil Computations in Relativistic Hydrodynamic Simulation , 2013, PPAM.

[3]  Devin W. Silvia,et al.  ENZO: AN ADAPTIVE MESH REFINEMENT CODE FOR ASTROPHYSICS , 2013, J. Open Source Softw..

[4]  Paul Charbonneau,et al.  Solar Dynamo Theory , 2014 .

[5]  Gerhard W. Zumbusch Vectorized Higher Order Finite Difference Kernels , 2012, PARA.

[6]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[7]  A. Shukurov,et al.  The supernova-regulated ISM - II. The mean magnetic field , 2012, 1206.6784.

[8]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[9]  José E. Moreira,et al.  Proceedings of the 8th Workshop on High Performance Computational Finance , 2015, WHPCF@SC.

[10]  Paulius Micikevicius,et al.  3D finite difference computation on GPUs using CUDA , 2009, GPGPU-2.

[11]  The inverse cascade and nonlinear alpha-effect in simulations of isotropic helical hydromagnetic turbulence , 2000, astro-ph/0006186.

[12]  Mauricio Araya-Polo,et al.  Algorithm 942 , 2014 .

[13]  James Demmel,et al.  Benchmarking GPUs to tune dense linear algebra , 2008, HiPC 2008.

[14]  Evan E. Schneider,et al.  CHOLLA: A NEW MASSIVELY PARALLEL HYDRODYNAMICS CODE FOR ASTROPHYSICAL SIMULATION , 2014, 1410.4194.

[15]  Frank Mueller,et al.  Auto-generation and auto-tuning of 3D stencil codes on GPU clusters , 2012, CGO '12.

[16]  Long Wang,et al.  Acceleration of a High Order Finite-Difference WENO Scheme for Large-Scale Cosmological Simulations on GPU , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[17]  Juliane Junker,et al.  Computer Organization And Design The Hardware Software Interface , 2016 .

[18]  Boulder,et al.  Multiple dynamo modes as a mechanism for long-term solar activity variations , 2015, 1507.05417.

[19]  Chi-kwan Chan,et al.  Dynamics of saturated energy condensation in two-dimensional turbulence. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20]  Julien Demouth,et al.  GPU Implementation of Finite Difference Solvers , 2014, 2014 Seventh Workshop on High Performance Computational Finance.

[21]  D. O. Astronomy,et al.  Interstellar Turbulence I: Observations and Processes , 2004, astro-ph/0404451.

[22]  Cosmin Nita,et al.  Optimized three-dimensional stencil computation on Fermi and Kepler GPUs , 2014, 2014 IEEE High Performance Extreme Computing Conference (HPEC).

[23]  E. Ostriker,et al.  Theory of Star Formation , 2007, 0707.3514.

[24]  A 3D MHD model of astrophysical flows: Algorithms, tests and parallelisation , 2001, astro-ph/0102068.