Comparison of different propagation steps for lattice Boltzmann methods

Several possibilities exist to implement the propagation step of lattice Boltzmann methods. This paper describes common implementations and compares the number of memory transfer operations they require per lattice node update. A performance model based on the memory bandwidth is then used to obtain an estimation of the maximum achievable performance on different machines. A subset of the discussed implementations of the propagation step are benchmarked on different Intel- and AMD-based compute nodes using the framework of an existing flow solver that is specially adapted to simulate flow in porous media, and the model is validated against the measurements. Advanced approaches for the propagation step like ''A-A pattern'' or ''Esoteric Twist'' require more programming effort but often sustain significantly better performance than non-naive but straightforward implementations.

[1]  Samuel Williams,et al.  Extracting ultra-scale Lattice Boltzmann performance via hierarchical and distributed auto-tuning , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[2]  Ulrich Rüde,et al.  Cache Performance Optimizations for Parallel Lattice Boltzmann Codes , 2003, Euro-Par.

[3]  D. Hilbert Ueber die stetige Abbildung einer Line auf ein Flächenstück , 1891 .

[4]  Ernst Rank,et al.  Parallelization Strategies and Efficiency of CFD Computations in Complex Geometries Using Lattice Boltzmann Methods on High-Performance Computers , 2002 .

[5]  Irina Ginzburg,et al.  Lattice Boltzmann approach to Richards' equation , 2004 .

[6]  G. Wellein,et al.  Introducing a parallel cache oblivious blocking approach for the lattice Boltzmann method , 2008 .

[7]  Gerhard Wellein,et al.  Domain decomposition and locality optimization for large-scale lattice Boltzmann simulations , 2011, ArXiv.

[8]  J. Boon The Lattice Boltzmann Equation for Fluid Dynamics and Beyond , 2003 .

[9]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[10]  Cass T. Miller,et al.  A high-performance lattice Boltzmann implementation to model flow in porous media , 2003 .

[11]  François Bertrand,et al.  On improving the performance of large parallel lattice Boltzmann flow simulations in heterogeneous porous media , 2010 .

[12]  Peter Bailey,et al.  Accelerating Lattice Boltzmann Fluid Flow Simulations Using Graphics Processors , 2009, 2009 International Conference on Parallel Processing.

[13]  Massimo Bernaschi,et al.  MUPHY: A parallel high performance MUlti PHYsics/Scale code , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[14]  Jan Westerholm,et al.  An efficient swap algorithm for the lattice Boltzmann method , 2007, Comput. Phys. Commun..

[15]  Gerhard Wellein,et al.  Efficient Temporal Blocking for Stencil Computations by Multicore-Aware Wavefront Parallelization , 2009, 2009 33rd Annual IEEE International Computer Software and Applications Conference.

[16]  Jan Linxweiler Ein integrierter Softwareansatz zur interaktiven Exploration und Steuerung von Strömungssimulationen auf Many-Core-Architekturen , 2011 .

[17]  Gerhard Wellein,et al.  Introduction to High Performance Computing for Scientists and Engineers , 2010, Chapman and Hall / CRC computational science series.

[18]  D. d'Humières,et al.  Two-relaxation-time Lattice Boltzmann scheme: About parametrization, velocity, pressure and mixed boundary conditions , 2008 .

[19]  François Pellegrini,et al.  PT-Scotch: A tool for efficient parallel graph ordering , 2008, Parallel Comput..

[20]  Ulrich Rüde,et al.  Optimization and Profiling of the Cache Performance of Parallel Lattice Boltzmann Codes in 2 D and 3 D ∗ , 2003 .

[21]  P. Bhatnagar,et al.  A Model for Collision Processes in Gases. I. Small Amplitude Processes in Charged and Neutral One-Component Systems , 1954 .

[22]  Georg Hager,et al.  Introducing a Performance Model for Bandwidth-Limited Loop Kernels , 2009, PPAM.

[23]  Manfred Krafczyk,et al.  A parallelisation concept for a multi-physics lattice Boltzmann prototype based on hierarchical grids , 2008 .

[24]  Xiaoxian Zhang,et al.  Domain-decomposition method for parallel lattice Boltzmann simulation of incompressible flow in porous media. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[25]  Thomas Zeiser,et al.  Performance evaluation of a parallel sparse lattice Boltzmann solver , 2008, J. Comput. Phys..

[26]  M. Krafczyk,et al.  3 D GPGPU LBM Implementation on Non-Uniform Grids , 2011 .

[27]  Gerhard Wellein,et al.  Benchmark Analysis and Application Results for Lattice Boltzmann Simulations on NEC SX Vector and Intel Nehalem Systems , 2009, Parallel Process. Lett..

[28]  D. d'Humières,et al.  Multiple–relaxation–time lattice Boltzmann models in three dimensions , 2002, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[29]  Ulrich Rüde,et al.  Lehrstuhl Für Informatik 10 (systemsimulation) Walberla: Hpc Software Design for Computational Engineering Simulations Walberla: Hpc Software Design for Computational Engineering Simulations , 2010 .

[30]  Gerhard Wellein,et al.  On the single processor performance of simple lattice Boltzmann kernels , 2006 .