Block-asynchronous and Jacobi smoothers for a multigrid solver on GPU-accelerated HPC clusters

We investigate CPU- and GPU-based damped block-asynchronous iteration as an alternative for the damped CPU-based Jacobi smoother in a geometric multigrid linear solver. We depict the implementation for distributed memory systems as well as for CUDA-capable accelerators. Our numerical experiments are based on the linear problem arising from a finite element discretization of the Poisson equation. Runtime and energy measurements are presented for a dual-CPU test system equipped with a GPU. We find that the smoothing properties of the block-asynchronous smoothers are diminished by their asynchronous nature. When using a domain decomposition, damped synchronized Jacobi iteration as smoother with CPU-only computation on multiple host processes yields better performance and lower energy consumption than the block-asynchronous variants for both CPU and GPU execution. However, for a single host process without domain decomposition, the GPU-accelerated block-asynchronous method can compensate the diminished smoothing property and outperforms the CPU-only execution both in terms of runtime and energy consumption.

[1]  Ulrike Meier Yang,et al.  On the use of relaxation parameters in hybrid smoothers , 2004, Numer. Linear Algebra Appl..

[2]  Enrique S. Quintana-Ortí,et al.  An Integrated Framework for Power-Performance Analysis of Parallel Scientific Workloads , 2013 .

[3]  Thomas Steinke,et al.  Multi-threaded Kernel Offloading to GPGPU Using Hyper-Q on Kepler Architecture , 2014 .

[4]  Werner Augustin,et al.  HiFlow3: A Hardware-Aware Parallel Finite Element Package , 2011, Parallel Tools Workshop.

[5]  Jack J. Dongarra,et al.  A block-asynchronous relaxation method for graphics processing units , 2013, J. Parallel Distributed Comput..

[6]  D. Szyld,et al.  On asynchronous iterations , 2000 .

[7]  Daniel B. Szyld,et al.  Block and asynchronous two-stage methods for mildly nonlinear systems , 1999, Numerische Mathematik.

[8]  Andreas Meister,et al.  Numerik linearer Gleichungssysteme , 1999 .

[9]  Vincent Heuveline,et al.  GPU-Accelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement , 2011 .

[10]  Jack L. Rosenfeld,et al.  A case study in programming for parallel-processors , 1969, CACM.

[11]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[12]  D. E. Baz,et al.  Asynchronous iterations with flexible communication: contracting operators , 2005 .

[13]  Wolfgang Hackbusch,et al.  Multi-grid methods and applications , 1985, Springer series in computational mathematics.

[14]  Jinchao Xu,et al.  Iterative Methods by Space Decomposition and Subspace Correction , 1992, SIAM Rev..