Load Balancing for CPU-GPU Coupling in Computational Fluid Dynamics

This paper investigates static load balancing models for CPU-GPU coupling from a computational fluid dynamics perspective. While able to generate a benefit, traditional load balancing models are found to be too inaccurate to predict the runtime of a preconditioned conjugate gradient solver. Hence, an expanded model is derived that accounts for the multi-step nature of the solver, i.e. several communication barriers per iteration. It is able to predict the runtime to a margin of 5%, rendering CPU-GPU coupling better predictable so that load balancing can be improved substantially.

[1]  Cédric Augonnet,et al.  StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[2]  Gerhard Wellein,et al.  Introduction to High Performance Computing for Scientists and Engineers , 2010, Chapman and Hall / CRC computational science series.

[3]  Immo Huismann,et al.  Two-level parallelization of a fluid mechanics algorithm exploiting hardware heterogeneity , 2015 .

[4]  S. Sherwin,et al.  From h to p efficiently: Strategy selection for operator evaluation on hexahedral and tetrahedral elements , 2011 .

[5]  Joel Ferziger,et al.  Higher Order Methods for Incompressible Fluid Flow: by Deville, Fischer and Mund, Cambridge University Press, 499 pp. , 2003 .

[6]  Wolfgang Lehner,et al.  Limitations of Intra-operator Parallelism Using Heterogeneous Computing Resources , 2016, ADBIS.

[7]  Immo Huismann,et al.  Fast Static Condensation for the Helmholtz Equation in a Spectral-Element Discretization , 2015, PPAM.

[8]  Charles Hirsch,et al.  Numerical computation of internal & external flows: fundamentals of numerical discretization , 1988 .

[9]  Yi Jiang,et al.  Collaborating CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer , 2014, J. Comput. Phys..

[10]  G. Karniadakis,et al.  Spectral/hp Element Methods for CFD , 1999 .

[11]  J. Shewchuk An Introduction to the Conjugate Gradient Method Without the Agonizing Pain , 1994 .

[12]  Jérémie Allard,et al.  Multi-GPU and Multi-CPU Parallelization for Interactive Physics Simulations , 2010, Euro-Par.

[13]  Ulrich Rüde,et al.  Performance modeling and analysis of heterogeneous lattice Boltzmann simulations on CPU-GPU clusters , 2015, Parallel Comput..

[14]  Carsten Kutzner,et al.  Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS , 2015, EASC.

[15]  Jack J. Dongarra,et al.  A Step towards Energy Efficient Computing: Redesigning a Hydrodynamic Application on CPU-GPU , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[16]  Ziming Zhong,et al.  Data Partitioning on Multicore and Multi-GPU Platforms Using Functional Performance Models , 2015, IEEE Transactions on Computers.

[17]  Kai Xu,et al.  A hybrid solution method for CFD applications on GPU-accelerated hybrid HPC platforms , 2016, Future Gener. Comput. Syst..