A parallel lattice Boltzmann method for large eddy simulation on multiple GPUs

To improve the simulation efficiency of turbulent fluid flows at high Reynolds numbers with large eddy dynamics, a CUDA-based simulation solution of lattice Boltzmann method for large eddy simulation (LES) using multiple graphics processing units (GPUs) is proposed. Our solution adopts the “collision after propagation” lattice evolution way and puts the misaligned propagation phase at global memory read process. The latest GPU platform allows a single CPU thread to control up to four GPUs that run in parallel. In order to make use of multiple GPUs, the whole working set is evenly partitioned into sub-domains. We implement Smagorinsky model and Vreman model respectively to verify our multi-GPU solution. These two LES models have different relaxation time calculation behavior and lead to different CUDA implementation characteristics. The implementation based on Smagorinsky model achieves 190 times speedup over the sequential implementation on CPU, while the implementation based on Vreman model archives more than 90 times speedup. The experimental results show that the parallel performance of our multi-GPU solution scales very well on multiple GPUs. Therefore large-scale (up to 10,240 $$\times $$× 10,240 lattices) LES–LBM simulation becomes possible at a low cost, even using double-precision floating point calculation.

[1]  R. Benzi,et al.  The lattice Boltzmann equation: theory and applications , 1992 .

[2]  Jonas Tölke,et al.  Implementation of a Lattice Boltzmann kernel using the Compute Unified Device Architecture developed by nVIDIA , 2009, Comput. Vis. Sci..

[3]  Massimo Tessarotto,et al.  On boundary conditions in the Lattice-Boltzmann method , 2004 .

[4]  Carlos Rosales,et al.  Multiphase LBM Distributed over Multiple GPUs , 2011, 2011 IEEE International Conference on Cluster Computing.

[5]  Bernard Tourancheau,et al.  Global Memory Access Modelling for Efficient Implementation of the Lattice Boltzmann Method on Graphics Processing Units , 2010, VECPAR.

[6]  Jiming Liu,et al.  Speeding up K-Means Algorithm by GPUs , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.

[7]  J. Smagorinsky,et al.  GENERAL CIRCULATION EXPERIMENTS WITH THE PRIMITIVE EQUATIONS , 1963 .

[8]  Bernard Tourancheau,et al.  A new approach to the lattice Boltzmann method for graphics processing units , 2011, Comput. Math. Appl..

[9]  Xiaowen Chu,et al.  Massively Parallel Network Coding on GPUs , 2008, 2008 IEEE International Performance, Computing and Communications Conference.

[10]  Christian Obrecht,et al.  LBM based flow simulation using GPU computing processor , 2010, Comput. Math. Appl..

[11]  A. W. Vreman An eddy-viscosity subgrid-scale model for turbulent shear flow: Algebraic theory and applications , 2004 .

[12]  Johannes Habich,et al.  Performance Evaluation of Numeric Compute Kernels on nVIDIA GPUs , 2008 .

[13]  Bernard Tourancheau,et al.  Multi-GPU implementation of the lattice Boltzmann method , 2013, Comput. Math. Appl..

[14]  Shiyi Chen,et al.  A Lattice Boltzmann Subgrid Model for High Reynolds Number Flows , 1994, comp-gas/9401004.

[15]  Manfred Krafczyk,et al.  TeraFLOP computing on a desktop PC with GPUs for 3D CFD , 2008 .

[16]  Y. Qian,et al.  Lattice BGK Models for Navier-Stokes Equation , 1992 .

[17]  Bernard Tourancheau,et al.  The TheLMA project: Multi-GPU implementation of the lattice Boltzmann method , 2011, Int. J. High Perform. Comput. Appl..