论文信息 - Methods for compressible fluid simulation on GPUs using high-order finite differences

Methods for compressible fluid simulation on GPUs using high-order finite differences

Abstract We focus on implementing and optimizing a sixth-order finite-difference solver for simulating compressible fluids on a GPU using third-order Runge–Kutta integration. Since graphics processing units perform well in data-parallel tasks, this makes them an attractive platform for fluid simulation. However, high-order stencil computation is memory-intensive with respect to both main memory and the caches of the GPU. We present two approaches for simulating compressible fluids using 55-point and 19-point stencils. We seek to reduce the requirements for memory bandwidth and cache size in our methods by using cache blocking and decomposing a latency-bound kernel into several bandwidth-bound kernels. Our fastest implementation is bandwidth-bound and integrates 343 million grid points per second on a Tesla K40t GPU, achieving a 3 . 6 × speedup over a comparable hydrodynamics solver benchmarked on two Intel Xeon E5-2690v3 processors. Our alternative GPU implementation is latency-bound and achieves the rate of 168 million updates per second.

Omer Anjum | Maarit Käpylä | Petri J. Käpylä | Johannes Pekkilä | Miikka S. Väisälä

[2] Sebastian Cygert,et al. Using GPUs for Parallel Stencil Computations in Relativistic Hydrodynamic Simulation , 2013, PPAM.

[3] Devin W. Silvia,et al. ENZO: AN ADAPTIVE MESH REFINEMENT CODE FOR ASTROPHYSICS , 2013, J. Open Source Softw..

[4] Paul Charbonneau,et al. Solar Dynamo Theory , 2014 .

[5] Gerhard W. Zumbusch. Vectorized Higher Order Finite Difference Kernels , 2012, PARA.

[6] P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[7] A. Shukurov,et al. The supernova-regulated ISM - II. The mean magnetic field , 2012, 1206.6784.

[8] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .

[9] José E. Moreira,et al. Proceedings of the 8th Workshop on High Performance Computational Finance , 2015, WHPCF@SC.

[10] Paulius Micikevicius,et al. 3D finite difference computation on GPUs using CUDA , 2009, GPGPU-2.

[11] The inverse cascade and nonlinear alpha-effect in simulations of isotropic helical hydromagnetic turbulence , 2000, astro-ph/0006186.

[12] Mauricio Araya-Polo,et al. Algorithm 942 , 2014 .

[13] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, HiPC 2008.

[14] Evan E. Schneider,et al. CHOLLA: A NEW MASSIVELY PARALLEL HYDRODYNAMICS CODE FOR ASTROPHYSICAL SIMULATION , 2014, 1410.4194.

[15] Frank Mueller,et al. Auto-generation and auto-tuning of 3D stencil codes on GPU clusters , 2012, CGO '12.

[16] Long Wang,et al. Acceleration of a High Order Finite-Difference WENO Scheme for Large-Scale Cosmological Simulations on GPU , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[17] Juliane Junker,et al. Computer Organization And Design The Hardware Software Interface , 2016 .

[18] Boulder,et al. Multiple dynamo modes as a mechanism for long-term solar activity variations , 2015, 1507.05417.

[19] Chi-kwan Chan,et al. Dynamics of saturated energy condensation in two-dimensional turbulence. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20] Julien Demouth,et al. GPU Implementation of Finite Difference Solvers , 2014, 2014 Seventh Workshop on High Performance Computational Finance.

[21] D. O. Astronomy,et al. Interstellar Turbulence I: Observations and Processes , 2004, astro-ph/0404451.

[22] Cosmin Nita,et al. Optimized three-dimensional stencil computation on Fermi and Kepler GPUs , 2014, 2014 IEEE High Performance Extreme Computing Conference (HPEC).

[23] E. Ostriker,et al. Theory of Star Formation , 2007, 0707.3514.

[24] A 3D MHD model of astrophysical flows: Algorithms, tests and parallelisation , 2001, astro-ph/0102068.