Global Memory Access Modelling for Efficient Implementation of the LBM on GPUs

In this work, we investigate the global memory access mechanism on recent GPUs. For the purpose of this study, we created specific benchmark programs, which allowed us to explore the scheduling of global memory transactions. Thus, we formulate a model capable of estimating the execution time for a large class of applications. Our main goal is to facilitate optimisation of regular data-parallel applications on GPUs. As an example, we finally describe our CUDA implementations of LBM flow solvers on which our model was able to estimate performance with less than 5% relative error.

[1]  Zanetti,et al.  Use of the Boltzmann equation to simulate lattice gas automata. , 1988, Physical review letters.

[2]  Ulrich Rüde,et al.  Optimization and Profiling of the Cache Performance of Parallel Lattice Boltzmann Codes in 2 D and 3 D ∗ , 2003 .

[3]  D. d'Humières,et al.  Multiple–relaxation–time lattice Boltzmann models in three dimensions , 2002, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[4]  Christian Obrecht,et al.  LBM based flow simulation using GPU computing processor , 2010, Comput. Math. Appl..

[5]  Y. Qian,et al.  Lattice BGK Models for Navier-Stokes Equation , 1992 .

[6]  Wen-mei W. Hwu,et al.  Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.

[7]  V. Natoli,et al.  Exploring New Architectures in Accelerating CFD for Air Force Applications , 2008, 2008 DoD HPCMP Users Group Conference.

[8]  Bernard Tourancheau,et al.  A new approach to the lattice Boltzmann method for graphics processing units , 2011, Comput. Math. Appl..

[9]  Henry Wong,et al.  Micro-benchmarking the GT 200 GPU , 2009 .

[10]  Chang Shu,et al.  A 3D incompressible thermal lattice Boltzmann model and its application to simulate natural convection in a cubic cavity , 2004 .

[11]  James Demmel,et al.  Benchmarking GPUs to tune dense linear algebra , 2008, HiPC 2008.

[12]  Arnaud Tisserand,et al.  Power Consumption of GPUs from a Software Perspective , 2009, ICCS.

[13]  Manfred Krafczyk,et al.  TeraFLOP computing on a desktop PC with GPUs for 3D CFD , 2008 .