Scalable multi-relaxation-time lattice Boltzmann simulations on multi-GPU cluster

In this paper, the D3Q19 multi-relaxation-time lattice Boltzmann model is adopted to simulate three-dimensional cavity flows using graphic processing units (GPUs). For single GPU computations, utilizing on-chip memory generates three to five times speedup over adopting global memory alone. Also, streaming using offset reading attains another two times speedup over employing offset writing. For Message Passing Interface (MPI) based multi-GPU computations, overlapping communication and computation can achieve 38% improvement and provide an efficient scheme to improve the scalability and its performance. Numerical experiments show that 12 TeslaTM M2070 GPUs produce around 5500 million lattices updates per second (MLUPS) using 57635763 grid. On the other hand, three GTX Titans deliver 5000 MLUPS for 19231923 grids, while 12 Tesla attain half performance.

[1]  D. d'Humières,et al.  Multiple–relaxation–time lattice Boltzmann models in three dimensions , 2002, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[2]  Bernard Tourancheau,et al.  Author Manuscript, Published in "computers and Mathematics with Applications (2010)" a New Approach to the Lattice Boltzmann Method for Graphics Processing Units , 2011 .

[3]  Chao-An Lin,et al.  Lattice Boltzmann simulations of incompressible liquid–gas systems on partial wetting surfaces , 2011, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[4]  Bernard Tourancheau,et al.  Scalable lattice Boltzmann solvers for CUDA GPU clusters , 2013, Parallel Comput..

[5]  Bernard Tourancheau,et al.  Multi-GPU implementation of the lattice Boltzmann method , 2013, Comput. Math. Appl..

[6]  Christian Obrecht,et al.  LBM based flow simulation using GPU computing processor , 2010, Comput. Math. Appl..

[7]  Shiyi Chen,et al.  Simulation of Cavity Flow by the Lattice Boltzmann Method , 1994, comp-gas/9401003.

[8]  Chao-An Lin,et al.  Multi relaxation time lattice Boltzmann simulations of transition in deep 2D lid driven cavity using GPU , 2013 .

[9]  Kenli Li,et al.  Entropic Lattice Boltzmann Method based high Reynolds number flow simulation using CUDA on GPU , 2013 .

[10]  Chao-An Lin,et al.  Consistent Boundary Conditions for 2D and 3D Lattice Boltzmann Simulations , 2009 .

[11]  Jonas Tölke,et al.  Implementation of a Lattice Boltzmann kernel using the Compute Unified Device Architecture developed by nVIDIA , 2009, Comput. Vis. Sci..

[12]  P. Lallemand,et al.  Theory of the lattice boltzmann method: dispersion, dissipation, isotropy, galilean invariance, and stability , 2000, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[13]  Christian Obrecht,et al.  Multi-GPU implementation of a hybrid thermal lattice Boltzmann solver using the TheLMA framework , 2013 .

[14]  Takayuki Aoki,et al.  Multi-GPU performance of incompressible flow computation by lattice Boltzmann method on GPU cluster , 2011, Parallel Comput..

[15]  Chao-An Lin,et al.  Simulations of flow instability in three dimensional deep cavities with multi relaxation time lattice Boltzmann method on graphic processing units , 2013 .

[16]  H. Kuhlmann,et al.  Accurate three-dimensional lid-driven cavity flow , 2005 .

[17]  Cheng Chang,et al.  Boundary conditions for lattice Boltzmann simulations with complex geometry flows , 2009, Comput. Math. Appl..

[18]  Chih-Wei Hsieh,et al.  GPU acceleration for general conservation equations and its application to several engineering problems , 2011 .

[19]  Vahid Esfahanian,et al.  A More Robust Compressible Lattice Boltzmann Model by using the Numerical Filters , 2014 .

[20]  Aoki Takayuki,et al.  Multi-GPU performance of incompressible flow computation by lattice Boltzmann method on GPU cluster , 2011, ParCo 2011.