Optimizing Performance of the Lattice Boltzmann Method for Complex Structures on Cache-based Architectures

Delivering high sustained performance for memory-intensive applications in computational fluid dynamics on cache-based microprocessors is a long-standing challenge. In particular, non regular data access patterns, as arising from porous media flow within lattice Boltzmann codes, can lead to poor performance. To address this problem, we combine a 1-D list data representation with advanced code optimizations and are able to achieve a high performance level, which is mostly independent of geometry and obstacle/fluid ratio. The idea of traversing memory using space-filling curves is tested as well, but our results indicate that this approach alone can not compete with standard techniques, i.e. blocking and data layout optimization, which become architecture dependent if an indirect memory addressing scheme is being used.