Block-Relaxation Methods for 3D Constant-Coefficient Stencils on GPUs and Multicore CPUs

Block iterative methods are extremely important as smoothers for multigrid methods, as preconditioners for Krylov methods, and as solvers for diagonally dominant linear systems. Developing robust and efficient smoother algorithms suitable for current and evolving GPU and multicore CPU systems is a significant challenge. We address this issue in the case of constant-coefficient stencils arising in the solution of elliptic partial differential equations on structured 3D uniform and adaptively refined block structured grids. Robust, highly parallel implementations of block Jacobi and chaotic block Gauss-Seidel algorithms with exact inversion of the blocks are developed using different parallelization techniques. Experimental results for NVIDIA Fermi/Kepler GPUs and AMD multicore systems are presented.

[1]  D. Szyld,et al.  On asynchronous iterations , 2000 .

[2]  Richard S. Varga,et al.  Matrix Iterative Analysis , 2000, The Mathematical Gazette.

[3]  Pradeep Dubey,et al.  3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[4]  J. Strikwerda A convergence theorem for chaotic asynchronous relaxation , 1997 .

[5]  Bobby Philip,et al.  Adaptive algebraic smoothers , 2012, J. Comput. Appl. Math..

[6]  Gérard M. Baudet,et al.  Asynchronous Iterative Methods for Multiprocessors , 1978, JACM.

[7]  STEVE SCHAFFER,et al.  A Semicoarsening Multigrid Method for Elliptic Partial Differential Equations with Highly Discontinuous and Anisotropic Coefficients , 1998, SIAM J. Sci. Comput..

[8]  Michael Pernice,et al.  Implicit adaptive mesh refinement for 2D reduced resistive magnetohydrodynamics , 2008, J. Comput. Phys..

[9]  Tor Sørevik,et al.  Load balancing and OpenMP implementation of nested parallelism , 2005, Parallel Comput..

[10]  Ulrich Rüde,et al.  Towards Cache-Optimized Multigrid Using Patch-Adaptive Relaxation , 2004, PARA.

[11]  Barbara Chapman,et al.  Using OpenMP - portable shared memory parallel programming , 2007, Scientific and engineering computation.

[12]  Rohit Chandra,et al.  Parallel programming in openMP , 2000 .

[13]  P. P. Starling The numerical solution of Laplace's equation , 1963 .

[14]  Jie Cheng,et al.  Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..

[15]  J. Dendy Black box multigrid , 1982 .

[16]  J. Gillis,et al.  Matrix Iterative Analysis , 1961 .

[17]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[18]  J. Strikwerda A probabilistic analysis of asynchronous iteration , 2002 .

[19]  Yuan Shi,et al.  Timing Models and Local Stopping Criteria for Asynchronous Iterative Algorithms , 1999, J. Parallel Distributed Comput..

[20]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[21]  Jonathan J. Hu,et al.  Parallel multigrid smoothing: polynomial versus Gauss--Seidel , 2003 .

[22]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[23]  D. Brandt,et al.  Multi-level adaptive solutions to boundary-value problems math comptr , 1977 .

[24]  John R. Nickolls,et al.  Scalable parallel programming , 2008 .

[25]  S. McCormick,et al.  The fast adaptive composite grid (FAC) method for elliptic equation , 1986 .

[26]  Peng Li,et al.  Multigrid on GPU: tackling power grid analysis on parallel SIMT platforms , 2008, ICCAD 2008.

[27]  S. McCormick,et al.  A multigrid tutorial (2nd ed.) , 2000 .

[28]  Michael Pernice,et al.  Solution of Equilibrium Radiation Diffusion Problems Using Implicit Adaptive Mesh Refinement , 2005, SIAM J. Sci. Comput..

[29]  Mark F. Adams A distributed memory unstructured gauss-seidel algorithm for multigrid smoothers , 2001, SC.

[30]  Jack Dongarra,et al.  Block-asynchronous multigrid smoothers for GPU-accelerated systems , 2011 .

[31]  Seymour V. Parter,et al.  “Multi-line” iterative methods for elliptic difference equations and fundamental frequencies , 1961 .

[32]  Richard W. Vuduc,et al.  Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems , 2009, ICS.

[33]  Paulius Micikevicius,et al.  3D finite difference computation on GPUs using CUDA , 2009, GPGPU-2.