Performance Evaluation of an OpenCL Implementation of the Lattice Boltzmann Method on the Intel Xeon Phi

A portable OpenCL implementation of the lattice Boltzmann method targeting emerging many-core architectures is described. The main purpose of this work is to evaluate and compare the performance of this code on three mainstream hardware architectures available today, namely an Intel CPU, an Nvidia GPU, and the Intel Xeon Phi. Because of the similarities between OpenCL and CUDA, we chose to follow some of the strategies devised to implement efficient lattice Boltzmann solvers on Nvidia GPU, while remaining as generic as possible. Being fairly configurable, this program makes possible to ascertain the best options for each hardware platforms. The achieved performance is quite satisfactory for both the CPU and the GPU. For the Xeon Phi however, the results are below expectations. Nevertheless, comparison with data from the literature shows that on this architecture the code seems memory-bound.

[1]  Christian Obrecht,et al.  Multi-GPU implementation of a hybrid thermal lattice Boltzmann solver using the TheLMA framework , 2013 .

[2]  Benoît Dupont de Dinechin,et al.  A Distributed Run-Time Environment for the Kalray MPPA®-256 Integrated Manycore Processor , 2013, ICCS.

[3]  Manfred Krafczyk,et al.  Free surface flow simulations on GPGPUs using the LBM , 2011, Comput. Math. Appl..

[4]  Pietro Asinari,et al.  Link-wise artificial compressibility method , 2011, J. Comput. Phys..

[5]  Shiyi Chen,et al.  LATTICE BOLTZMANN METHOD FOR FLUID FLOWS , 2001 .

[6]  D. d'Humières,et al.  Multiple–relaxation–time lattice Boltzmann models in three dimensions , 2002, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[7]  Christian Obrecht,et al.  LBM based flow simulation using GPU computing processor , 2010, Comput. Math. Appl..

[8]  Raffaele Tripiccione,et al.  Early Experience on Porting and Running a Lattice Boltzmann Code on the Xeon-Phi Co-Processor , 2013, ICCS.

[9]  Pietro Asinari,et al.  High-performance implementations and large-scale validation of the link-wise artificial compressibility method , 2014, J. Comput. Phys..

[10]  Bernard Tourancheau,et al.  A new approach to the lattice Boltzmann method for graphics processing units , 2011, Comput. Math. Appl..

[11]  Manfred Krafczyk,et al.  TeraFLOP computing on a desktop PC with GPUs for 3D CFD , 2008 .

[12]  Bernard Tourancheau,et al.  Scalable lattice Boltzmann solvers for CUDA GPU clusters , 2013, Parallel Comput..

[13]  Hongwei Zheng,et al.  A lattice Boltzmann model for multiphase flows with large density ratio , 2006, J. Comput. Phys..