论文信息 - Early Experience on Porting and Running a Lattice Boltzmann Code on the Xeon-Phi Co-Processor

Early Experience on Porting and Running a Lattice Boltzmann Code on the Xeon-Phi Co-Processor

Abstract In this paper we report on our early experience on porting, optimizing and benchmarking a Lattice Boltzmann (LB) code on the Xeon-Phi co-processor, the first generally available version of the new Many Integrated Core (MIC) architecture, developed by Intel. We consider as a test-bed a state-of-the-art LB model, that accurately reproduces the thermo-hydrodynamics of a 2D- fluid obeying the equations of state of a perfect gas. The regular structure of LB algorithms makes it relatively easy to identify a large degree of available parallelism. However, mapping a large fraction of this parallelism onto this new class of processors is not straightforward. The D2Q37 LB algorithm considered in this paper is an appropriate test-bed for this architecture since the critical computing kernels require high performances both in terms of memory bandwidth for sparse memory access patterns and number crunching capability. We describe our implementation of the code, that builds on previous experience made on other (simpler) many-core processors and GPUs, present benchmark results and measure performances, and finally compare with the results obtained by previous implementations developed on state-of-the-art classic multi-core CPUs and GP-GPUs.

[1] Federico Toschi,et al. Lattice Boltzmann methods for thermal flows: Continuum limit and applications to compressible Rayleigh-Taylor systems , 2010, 1005.3639.

[2] J. Boon. The Lattice Boltzmann Equation for Fluid Dynamics and Beyond , 2003 .

[3] Ulrich Rüde,et al. Optimization and Profiling of the Cache Performance of Parallel Lattice Boltzmann Codes in 2 D and 3 D ∗ , 2003 .

[4] L. Biferale,et al. Lattice Boltzmann method with self-consistent thermo-hydrodynamic equilibria , 2009, Journal of Fluid Mechanics.

[5] Raffaele Tripiccione,et al. An optimized D2Q37 Lattice Boltzmann code on GP-GPUs , 2013 .

[6] S. F. Schifano,et al. Implementation and optimization of a thermal Lattice Boltzmann algorithm on a multi-GPU cluster , 2012, 2012 Innovative Parallel Computing (InPar).

[7] Federico Toschi,et al. A Multi-GPU Implementation of a D2Q37 Lattice Boltzmann Code , 2011, PPAM.

[8] Federico Toschi,et al. Optimization of Multi-Phase Compressible Lattice Boltzmann Codes on Massively Parallel Multi-Core Systems , 2011, ICCS.

[9] Gerhard Wellein,et al. On the single processor performance of simple lattice Boltzmann kernels , 2006 .