Numerically Stable Variants of the Communication-hiding Pipelined Conjugate Gradients Algorithm for the Parallel Solution of Large Scale Symmetric Linear Systems

By reducing the number of global synchronization bottlenecks per iteration and hiding communication behind useful computational work, pipelined Krylov subspace methods achieve significantly improved parallel scalability on present-day HPC hardware. However, this typically comes at the cost of a reduced maximal attainable accuracy. This paper presents and compares several stabilized versions of the communication-hiding pipelined Conjugate Gradients method. The main novel contribution of this work is the reformulation of the multi-term recurrence pipelined CG algorithm by introducing shifts in the recursions for specific auxiliary variables. These shifts reduce the amplification of local rounding errors on the residual. The stability analysis presented in this work provides a rigorous method for selection of the optimal shift value in practice. It is shown that, given a proper choice for the shift parameter, the resulting shifted pipelined CG algorithm restores the attainable accuracy and displays nearly identical robustness to local rounding error propagation compared to classical CG. Numerical results on a variety of SPD benchmark problems compare different stabilization techniques for the pipelined CG algorithm, showing that the shifted pipelined CG algorithm is able to attain a high accuracy while displaying excellent parallel performance.

[1]  Anthony T. Chronopoulos,et al.  Parallel Iterative S-Step Methods for Unsymmetric Linear Systems , 1996, Parallel Comput..

[2]  Qiang Ye,et al.  Residual Replacement Strategies for Krylov Subspace Iterative Methods for the Convergence of True Residuals , 2000, SIAM J. Sci. Comput..

[3]  Z. Strakos,et al.  Krylov Subspace Methods: Principles and Analysis , 2012 .

[4]  H. V. D. Vorst,et al.  Reducing the effect of global communication in GMRES( m ) and CG on parallel distributed memory computers , 1995 .

[5]  Zdenek Strakos,et al.  Accuracy of Two Three-term and Three Two-term Recurrences for Krylov Space Solvers , 2000, SIAM J. Matrix Anal. Appl..

[6]  Jack Dongarra,et al.  Numerical Linear Algebra for High-Performance Computers , 1998 .

[7]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[8]  Anne Greenbaum,et al.  Predicting the Behavior of Finite Precision Lanczos and Conjugate Gradient Computations , 2015, SIAM J. Matrix Anal. Appl..

[9]  Anne Greenbaum,et al.  Iterative methods for solving linear systems , 1997, Frontiers in applied mathematics.

[10]  Anthony T. Chronopoulos,et al.  Block s‐step Krylov iterative methods , 2010, Numer. Linear Algebra Appl..

[11]  J. Dongarra,et al.  HPCG Benchmark: a New Metric for Ranking High Performance Computing Systems∗ , 2015 .

[12]  Jocelyne Erhel,et al.  A parallel GMRES version for general sparse matrices. , 1995 .

[13]  Zdenek Strakos Effectivity and optimizing of algorithms and programs on the host-computer/array-processor system , 1987, Parallel Comput..

[14]  William Gropp,et al.  Non-blocking Preconditioned Conjugate Gradient Methods for Extreme-scale Computing. , 2015 .

[15]  A. Greenbaum Behavior of slightly perturbed Lanczos and conjugate-gradient recurrences , 1989 .

[16]  E. Sturler A PARALLEL VARIANT OF GMRES(m) , 1991 .

[17]  John Shalf,et al.  The International Exascale Software Project roadmap , 2011, Int. J. High Perform. Comput. Appl..

[18]  Gerard L. G. Sleijpen,et al.  Differences in the Effects of Rounding Errors in Krylov Solvers for Symmetric Indefinite Linear Systems , 2000, SIAM J. Matrix Anal. Appl..

[19]  James Demmel,et al.  Avoiding Communication in Nonsymmetric Lanczos-Based Krylov Subspace Methods , 2013, SIAM J. Sci. Comput..

[20]  G. Meurant Computer Solution of Large Linear Systems , 1999 .

[21]  Z. Strakos,et al.  Error Estimation in Preconditioned Conjugate Gradients , 2005 .

[22]  Gerard L. G. Sleijpen,et al.  Reliable updated residuals in hybrid Bi-CG methods , 1996, Computing.

[23]  Wim Vanroose,et al.  Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm , 2014, Parallel Comput..

[24]  C. Paige Computational variants of the Lanczos method for the eigenproblem , 1972 .

[25]  E. F. DAzevedo,et al.  Reducing communication costs in the conjugate gradient algorithm on distributed memory multiprocessors , 1992 .

[26]  Christopher C. Paige,et al.  The computation of eigenvalues and eigenvectors of very large sparse matrices , 1971 .

[27]  M. Rozložník,et al.  ON THE NUMERICAL STABILITY ANALYSIS OF PIPELINED KRYLOV SUBSPACE METHODS , 2016 .

[28]  James Demmel,et al.  A Residual Replacement Strategy for Improving the Maximum Attainable Accuracy of s-Step Krylov Subspace Methods , 2014, SIAM J. Matrix Anal. Appl..

[29]  R. Pavani,et al.  Parallel Numerical Linear Algebra , 1995, PDP.

[30]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[31]  H. V. der Residual Replacement Strategies for Krylov Subspace Iterative Methods for the Convergence of True Residuals , 2000 .

[32]  Sandia Report,et al.  Toward a New Metric for Ranking High Performance Computing Systems , 2013 .

[33]  Anthony T. Chronopoulos,et al.  s-step iterative methods for symmetric linear systems , 1989 .

[34]  A. Greenbaum Estimating the Attainable Accuracy of Recursively Computed Residual Methods , 1997, SIAM J. Matrix Anal. Appl..

[35]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[36]  C. Paige Accuracy and effectiveness of the Lanczos algorithm for the symmetric eigenproblem , 1980 .

[37]  C. Paige Error Analysis of the Lanczos Algorithm for Tridiagonalizing a Symmetric Matrix , 1976 .

[38]  Wim Vanroose,et al.  The communication-hiding pipelined BiCGstab method for the parallel solution of large unsymmetric linear systems , 2016, Parallel Comput..

[39]  Z. Strakos,et al.  On error estimation in the conjugate gradient method and why it works in finite precision computations. , 2002 .

[40]  James Demmel,et al.  Applied Numerical Linear Algebra , 1997 .

[41]  Wim Vanroose,et al.  Hiding Global Communication Latency in the GMRES Algorithm on Massively Parallel Machines , 2013, SIAM J. Sci. Comput..

[42]  Zdenek Strakos,et al.  Composite convergence bounds based on Chebyshev polynomials and finite precision conjugate gradient computations , 2014, Numerical Algorithms.

[43]  G. Meurant,et al.  The Lanczos and conjugate gradient algorithms in finite precision arithmetic , 2006, Acta Numerica.