Analyzing and improving maximal attainable accuracy in the communication hiding pipelined BiCGStab method

Abstract Pipelined Krylov subspace methods avoid communication latency by reducing the number of global synchronization bottlenecks and by hiding global communication behind useful computational work. In exact arithmetic pipelined Krylov subspace algorithms are equivalent to classic Krylov subspace methods and generate identical series of iterates. However, as a consequence of the reformulation of the algorithm to improve parallelism, pipelined methods may suffer from severely reduced attainable accuracy in a practical finite precision setting. This work presents a numerical stability analysis that describes and quantifies the impact of local rounding error propagation on the maximal attainable accuracy of the multi-term recurrences in the preconditioned pipelined BiCGStab method. Theoretical expressions for the gaps between the true and computed residual as well as other auxiliary variables used in the algorithm are derived, and the elementary dependencies between the gaps on the various recursively computed vector variables are analyzed. The norms of the corresponding propagation matrices and vectors provide insights in the possible amplification of local rounding errors throughout the algorithm. Stability of the pipelined BiCGStab method is compared numerically to that of pipelined CG on a symmetric benchmark problem. Furthermore, numerical evidence supporting the effectiveness of employing a residual replacement type strategy to improve the maximal attainable accuracy for the pipelined BiCGStab method is provided.

[1]  Samuel H. Fuller,et al.  Computing Performance: Game Over or Next Level? , 2011, Computer.

[2]  Gerard L. G. Sleijpen,et al.  Maintaining convergence properties of BiCGstab methods in finite precision arithmetic , 1995, Numerical Algorithms.

[3]  Siegfried Cools,et al.  Numerical analysis of the maximal attainable accuracy in communication hiding pipelined Conjugate Gradient methods , 2018, 1804.02962.

[4]  Wim Vanroose,et al.  The communication-hiding pipelined BiCGstab method for the parallel solution of large unsymmetric linear systems , 2016, Parallel Comput..

[5]  Anthony T. Chronopoulos,et al.  s-step iterative methods for symmetric linear systems , 1989 .

[6]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[7]  C. Paige Computational variants of the Lanczos method for the eigenproblem , 1972 .

[8]  Gerard L. G. Sleijpen,et al.  BiCGstab(l) and other hybrid Bi-CG methods , 1994, Numerical Algorithms.

[9]  Jack J. Dongarra,et al.  Improving Performance of GMRES by Reducing Communication and Pipelining Global Collectives , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[10]  C. Paige Error Analysis of the Lanczos Algorithm for Tridiagonalizing a Symmetric Matrix , 1976 .

[11]  Laurence T. Yang,et al.  The improved BiCG method for large and sparse linear systems on parallel distributed memory architectures , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[12]  Lexing Ying,et al.  Sweeping Preconditioner for the Helmholtz Equation: Moving Perfectly Matched Layers , 2010, Multiscale Model. Simul..

[13]  Edmond Chow,et al.  Fine-Grained Parallel Incomplete LU Factorization , 2015, SIAM J. Sci. Comput..

[14]  Henk A. van der Vorst,et al.  Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems , 1992, SIAM J. Sci. Comput..

[15]  John Shalf,et al.  The International Exascale Software Project roadmap , 2011, Int. J. High Perform. Comput. Appl..

[16]  Gerard L. G. Sleijpen,et al.  Reliable updated residuals in hybrid Bi-CG methods , 1996, Computing.

[17]  James Demmel,et al.  Avoiding Communication in Nonsymmetric Lanczos-Based Krylov Subspace Methods , 2013, SIAM J. Sci. Comput..

[18]  Barry F. Smith,et al.  Domain Decomposition: Parallel Multilevel Methods for Elliptic Partial Differential Equations , 1996 .

[19]  Zdenek Strakos,et al.  Composite convergence bounds based on Chebyshev polynomials and finite precision conjugate gradient computations , 2014, Numerical Algorithms.

[20]  Anthony T. Chronopoulos,et al.  Block s‐step Krylov iterative methods , 2010, Numer. Linear Algebra Appl..

[21]  J. Dongarra,et al.  HPCG Benchmark: a New Metric for Ranking High Performance Computing Systems∗ , 2015 .

[22]  Gene H. Golub,et al.  Calculating the singular values and pseudo-inverse of a matrix , 2007, Milestones in Matrix Computation.

[23]  Zdenek Strakos Effectivity and optimizing of algorithms and programs on the host-computer/array-processor system , 1987, Parallel Comput..

[24]  G. Meurant,et al.  The Lanczos and conjugate gradient algorithms in finite precision arithmetic , 2006, Acta Numerica.

[25]  L.T. Yang,et al.  The improved BiCGStab method for large and sparse unsymmetric linear systems on parallel distributed memory architectures , 2002, Fifth International Conference on Algorithms and Architectures for Parallel Processing, 2002. Proceedings..

[26]  C. Paige Accuracy and effectiveness of the Lanczos algorithm for the symmetric eigenproblem , 1980 .

[27]  Gerard L. G. Sleijpen,et al.  Differences in the Effects of Rounding Errors in Krylov Solvers for Symmetric Indefinite Linear Systems , 2000, SIAM J. Matrix Anal. Appl..

[28]  R. Fletcher Conjugate gradient methods for indefinite systems , 1976 .

[29]  William Gropp,et al.  Scalable Non-blocking Preconditioned Conjugate Gradient Methods , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[30]  A. Greenbaum Estimating the Attainable Accuracy of Recursively Computed Residual Methods , 1997, SIAM J. Matrix Anal. Appl..

[31]  Å. Björck A bidiagonalization algorithm for solving large and sparse ill-posed systems of linear equations , 1988 .

[32]  Mark Hoemmen,et al.  Communication-avoiding Krylov subspace methods , 2010 .

[33]  E. F. DAzevedo,et al.  Reducing communication costs in the conjugate gradient algorithm on distributed memory multiprocessors , 1992 .

[34]  Christopher C. Paige,et al.  The computation of eigenvalues and eigenvectors of very large sparse matrices , 1971 .

[35]  Matthew G. Knepley,et al.  A stochastic performance model for pipelined Krylov methods , 2016, Concurr. Comput. Pract. Exp..

[36]  Jocelyne Erhel,et al.  Varying the s in Your s-step GMRES , 2018 .

[37]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[38]  Anthony T. Chronopoulos s-Step Iterative Methods for (Non) Symmetric (In) Definite Linear Systems , 1989, PPSC.

[39]  Anne Greenbaum,et al.  Iterative methods for solving linear systems , 1997, Frontiers in applied mathematics.

[40]  H. V. der Residual Replacement Strategies for Krylov Subspace Iterative Methods for the Convergence of True Residuals , 2000 .

[41]  James Demmel,et al.  Parallel numerical linear algebra , 1993, Acta Numerica.

[42]  Jocelyne Erhel,et al.  A parallel GMRES version for general sparse matrices. , 1995 .

[43]  L.T. Yang,et al.  The improved CGS method for large and sparse linear systems on bulk synchronous parallel architectures , 2002, Fifth International Conference on Algorithms and Architectures for Parallel Processing, 2002. Proceedings..

[44]  Sascha M. Schnepp,et al.  Pipelined, Flexible Krylov Subspace Methods , 2015, SIAM J. Sci. Comput..

[45]  Z. Strakos,et al.  Error Estimation in Preconditioned Conjugate Gradients , 2005 .

[46]  James Demmel,et al.  Applied Numerical Linear Algebra , 1997 .

[47]  Wim Vanroose,et al.  Hiding Global Communication Latency in the GMRES Algorithm on Massively Parallel Machines , 2013, SIAM J. Sci. Comput..

[48]  Cornelis Vuik,et al.  On a Class of Preconditioners for Solving the Helmholtz Equation , 2003 .

[49]  Laura Grigori,et al.  Enlarged Krylov Subspace Conjugate Gradient Methods for Reducing Communication , 2016, SIAM J. Matrix Anal. Appl..

[50]  A. Greenbaum Behavior of slightly perturbed Lanczos and conjugate-gradient recurrences , 1989 .

[51]  Emmanuel Agullo,et al.  Analyzing the Effect of Local Rounding Error Propagation on the Maximal Attainable Accuracy of the Pipelined Conjugate Gradient Method , 2016, SIAM J. Matrix Anal. Appl..

[52]  Mei Han An,et al.  accuracy and stability of numerical algorithms , 1991 .

[53]  Yvan Notay,et al.  On the convergence rate of the conjugate gradients in presence of rounding errors , 1993 .

[54]  Qiang Ye,et al.  Residual Replacement Strategies for Krylov Subspace Iterative Methods for the Convergence of True Residuals , 2000, SIAM J. Sci. Comput..

[55]  Anne Greenbaum,et al.  Predicting the Behavior of Finite Precision Lanczos and Conjugate Gradient Computations , 2015, SIAM J. Matrix Anal. Appl..

[56]  Marc Casas,et al.  Iteration-fusing conjugate gradient , 2017, ICS.

[57]  Miroslav Tuma,et al.  The Numerical Stability Analysis of Pipelined Conjugate Gradient Methods: Historical Context and Methodology , 2018, SIAM J. Sci. Comput..

[58]  Wim Vanroose,et al.  Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm , 2014, Parallel Comput..

[59]  P. Sonneveld CGS, A Fast Lanczos-Type Solver for Nonsymmetric Linear systems , 1989 .

[60]  Anthony T. Chronopoulos,et al.  Parallel Iterative S-Step Methods for Unsymmetric Linear Systems , 1996, Parallel Comput..

[61]  Qiang Ye,et al.  Analysis of the finite precision bi-conjugate gradient algorithm for nonsymmetric linear systems , 2000, Math. Comput..

[62]  Z. Strakos,et al.  Krylov Subspace Methods: Principles and Analysis , 2012 .

[63]  H. V. D. Vorst,et al.  Reducing the effect of global communication in GMRES( m ) and CG on parallel distributed memory computers , 1995 .

[64]  Y. Saad,et al.  Practical Use of Some Krylov Subspace Methods for Solving Indefinite and Nonsymmetric Linear Systems , 1984 .

[65]  Gene H. Golub,et al.  Matrix computations , 1983 .

[66]  Emmanuel Agullo,et al.  Hard Faults and Soft-Errors: Possible Numerical Remedies in Linear Algebra Solvers , 2016, VECPAR.

[67]  James Demmel,et al.  A Residual Replacement Strategy for Improving the Maximum Attainable Accuracy of s-Step Krylov Subspace Methods , 2014, SIAM J. Matrix Anal. Appl..

[68]  Jeffrey Cornelis,et al.  The Communication-Hiding Conjugate Gradient Method with Deep Pipelines , 2018, ArXiv.

[69]  Hong Zhang,et al.  Hierarchical Krylov and nested Krylov methods for extreme-scale computing , 2014, Parallel Comput..

[70]  Zdenek Strakos,et al.  Accuracy of Two Three-term and Three Two-term Recurrences for Krylov Space Solvers , 2000, SIAM J. Matrix Anal. Appl..

[71]  Erin Carson,et al.  Communication-Avoiding Krylov Subspace Methods in Theory and Practice , 2015 .

[72]  G. Meurant Computer Solution of Large Linear Systems , 1999 .

[73]  Z. Strakos,et al.  On error estimation in the conjugate gradient method and why it works in finite precision computations. , 2002 .

[74]  Sandia Report,et al.  Toward a New Metric for Ranking High Performance Computing Systems , 2013 .