Performance Analysis of the Lattice Boltzmann Method on x 86-64 Architectures

The Lattice Boltzmann method (LBM) is a well established algorithm to simulate fluid flow. The complexity of todays 3D simulation problems resulting in long computation times together with the fact that a standard implementation of the LBM only achieves a small fraction of the potential of a modern CPU is the motivation for this performance analysis. We show in our paper, that it is crucial to combine new CPU architectural features as software prefetching and SIMD instruction set extensions, with the established cache blocking techniques to utilize the computational power of modern CPUs.