A Parallel Implementation of a Lattice Boltzmann Method on the Clearspeed Advance TM Accelerator Board

Coprocessor and multicore technologies represent the current main development stream for compute server architectures in high performance computing. The primary challenge relies on the ability to exploit the associated computing power for highly CPU time-consuming applications in scientific computing. In this paper, we analyze specific methods adapted to the ClearSpeed AdvanceTMaccelerator board for the numerical solution of problems in computational fluid dynamics (CFD) by means of the lattice Boltzmann method. In this context, the main emphasis is given to the evaluation of this new technology with respect to sustained performance and efficiency. The ClearSpeed AdvanceTMaccelerator board is a PCI-X card equipped with two CSX600 processors, where each one holds 96 processing element cores. Each processing element handles 64-bit IEEE 754 floating point operations with double precision, which makes it attractive for applications in numerical simulation. The considered parallelization paradigm involves new concepts related to the distinction between poly and mono variables. As a model problem for our examination of the ClearSpeed AdvanceTMaccelerator board, we consider the simulation of fluid flow in a cuboid, known as lid driven cavity. An adequate parallel version of the lattice Boltzmann method is applied. Lattice Boltzmann methods are known to be perfectly suited for parallel architectures with high computing power due to the locality of the involved interactions. However, in the considered application, the solution process relies on a huge amount of data which needs to propagate along the underlying mesh. This fact, which is prototypical for this type of problem, shows up the bottleneck of the current internal communication bandwidth of the accelerator board.