论文信息 - Block-asynchronous and Jacobi smoothers for a multigrid solver on GPU-accelerated HPC clusters

Block-asynchronous and Jacobi smoothers for a multigrid solver on GPU-accelerated HPC clusters

We investigate CPU- and GPU-based damped block-asynchronous iteration as an alternative for the damped CPU-based Jacobi smoother in a geometric multigrid linear solver. We depict the implementation for distributed memory systems as well as for CUDA-capable accelerators. Our numerical experiments are based on the linear problem arising from a finite element discretization of the Poisson equation. Runtime and energy measurements are presented for a dual-CPU test system equipped with a GPU. We find that the smoothing properties of the block-asynchronous smoothers are diminished by their asynchronous nature. When using a domain decomposition, damped synchronized Jacobi iteration as smoother with CPU-only computation on multiple host processes yields better performance and lower energy consumption than the block-asynchronous variants for both CPU and GPU execution. However, for a single host process without domain decomposition, the GPU-accelerated block-asynchronous method can compensate the diminished smoothing property and outperforms the CPU-only execution both in terms of runtime and energy consumption.

Vincent Heuveline | Martin Wlotzka

[1] Ulrike Meier Yang,et al. On the use of relaxation parameters in hybrid smoothers , 2004, Numer. Linear Algebra Appl..

[2] Enrique S. Quintana-Ortí,et al. An Integrated Framework for Power-Performance Analysis of Parallel Scientific Workloads , 2013 .

[3] Thomas Steinke,et al. Multi-threaded Kernel Offloading to GPGPU Using Hyper-Q on Kepler Architecture , 2014 .

[4] Werner Augustin,et al. HiFlow3: A Hardware-Aware Parallel Finite Element Package , 2011, Parallel Tools Workshop.

[5] Jack J. Dongarra,et al. A block-asynchronous relaxation method for graphics processing units , 2013, J. Parallel Distributed Comput..

[6] D. Szyld,et al. On asynchronous iterations , 2000 .

[7] Daniel B. Szyld,et al. Block and asynchronous two-stage methods for mildly nonlinear systems , 1999, Numerische Mathematik.

[8] Andreas Meister,et al. Numerik linearer Gleichungssysteme , 1999 .

[9] Vincent Heuveline,et al. GPU-Accelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement , 2011 .

[10] Jack L. Rosenfeld,et al. A case study in programming for parallel-processors , 1969, CACM.

[11] Yousef Saad,et al. Iterative methods for sparse linear systems , 2003 .

[12] D. E. Baz,et al. Asynchronous iterations with flexible communication: contracting operators , 2005 .

[13] Wolfgang Hackbusch,et al. Multi-grid methods and applications , 1985, Springer series in computational mathematics.

[14] Jinchao Xu,et al. Iterative Methods by Space Decomposition and Subspace Correction , 1992, SIAM Rev..