A flexible Patch-based lattice Boltzmann parallelization approach for heterogeneous GPU-CPU clusters

Sustaining a large fraction of single GPU performance in parallel computations is considered to be the major problem of GPU-based clusters. We address this issue in the context of a lattice Boltzmann flow solver that is integrated in the WaLBerla software framework. Our multi-GPU implementation uses a block-structured MPI parallelization and is suitable for load balancing and heterogeneous computations on CPUs and GPUs. The overhead required for multi-GPU simulations is discussed in detail. It is demonstrated that a large fraction of the kernel performance can be sustained for weak scaling on InfiniBand clusters, leading to excellent parallel efficiency. However, in strong scaling scenarios using multiple GPUs is much less efficient than running CPU-only simulations on IBM BG/P and x86-based clusters. Hence, a cost analysis must determine the best course of action for a particular simulation task and hardware configuration. Finally we present weak scaling results of heterogeneous simulations conducted on CPUs and GPUs simultaneously, using clusters equipped with varying node configurations.

[1]  Ulrich Rüde,et al.  Coupling multibody dynamics and computational fluid dynamics on 8192 processor cores , 2010, Parallel Comput..

[2]  Gerhard Wellein,et al.  On the single processor performance of simple lattice Boltzmann kernels , 2006 .

[3]  Ulrich Rüde,et al.  WaLBerla: Exploiting Massively Parallel Systems for Lattice Boltzmann Simulations , 2009 .

[4]  Cyrus K. Aidun,et al.  Lattice-Boltzmann Method for Complex Flows , 2010 .

[5]  L. Luo,et al.  Lattice Boltzmann Model for the Incompressible Navier–Stokes Equation , 1997 .

[6]  S. Succi The Lattice Boltzmann Equation for Fluid Dynamics and Beyond , 2001 .

[7]  Gerhard Wellein,et al.  Benchmark Analysis and Application Results for Lattice Boltzmann Simulations on NEC SX Vector and Intel Nehalem Systems , 2009, Parallel Process. Lett..

[8]  Manfred Krafczyk,et al.  A parallelisation concept for a multi-physics lattice Boltzmann prototype based on hierarchical grids , 2008 .

[9]  Ralph Johnson,et al.  design patterns elements of reusable object oriented software , 2019 .

[10]  Ulf D. Schiller,et al.  Statistical mechanics of the fluctuating lattice Boltzmann equation. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[11]  Hans van Vliet Software Engineering: Principles and Practice, 2nd Edition , 2000 .

[12]  B Boehm A spiral model of software development and enhancement , 1986, SOEN.

[13]  Bernard Tourancheau,et al.  Author Manuscript, Published in "computers and Mathematics with Applications (2010)" a New Approach to the Lattice Boltzmann Method for Graphics Processing Units , 2011 .

[14]  Peter Zinterhof,et al.  Parallel Computing: Numerics, Applications, and Trends , 2009 .

[15]  W. Shyy,et al.  A multi‐block lattice Boltzmann method for viscous fluid flows , 2002 .

[16]  Ulrich Rüde,et al.  Localized Parallel Algorithm for Bubble Coalescence in Free Surface Lattice-Boltzmann Method , 2009, Euro-Par.

[17]  Harald Köstler,et al.  A multigrid framework for variational approaches in medical image processing and computer vision , 2008 .

[18]  Shiyi Chen,et al.  LATTICE BOLTZMANN METHOD FOR FLUID FLOWS , 2001 .

[19]  Hans van Vliet,et al.  Software engineering - principles and practice , 1993 .

[20]  Massimo Bernaschi,et al.  A flexible high-performance Lattice Boltzmann GPU code for the simulations of fluid flows in complex geometries , 2010 .

[21]  Manfred Krafczyk,et al.  TeraFLOP computing on a desktop PC with GPUs for 3D CFD , 2008 .

[22]  D. Haspel Simulation of clotting processes using a non-Newtonian blood models and the lattice Boltzmann method , 2009 .

[23]  W BoehmBarry A Spiral Model of Software Development and Enhancement , 1988 .

[24]  J. Boon The Lattice Boltzmann Equation for Fluid Dynamics and Beyond , 2003 .