The Communication-Hiding Conjugate Gradient Method with Deep Pipelines

Krylov subspace methods are among the most efficient present-day solvers for large scale linear algebra problems. Nevertheless, classic Krylov subspace method algorithms do not scale well on massively parallel hardware due to the synchronization bottlenecks induced by the computation of dot products throughout the algorithms. Communication-hiding pipelined Krylov subspace methods offer increased parallel scalability. One of the first published methods in this class is the pipelined Conjugate Gradient method (p-CG), which exhibits increased speedups on parallel machines. This is achieved by overlapping the time-consuming global communication phase with useful (independent) computations such as spmvs, hence reducing the impact of global communication as a synchronization bottleneck and avoiding excessive processor idling. However, on large numbers of processors the time spent in the global communication phase can be much higher than the time required for computing a single spmv. This work extends the pipelined CG method to deeper pipelines, which allows further scaling when the global communication phase is the dominant time-consuming factor. By overlapping the global all-to-all reduction phase in each CG iteration with the next l spmvs (pipelining), the method is able to hide communication latency behind computational work. The derivation of the p(l)-CG algorithm is based on the existing p(l)-GMRES method. Moreover, a number of theoretical and implementation properties of the p(l)-CG method are presented, including a preconditioned version of the algorithm. Experimental results are presented to demonstrate the possible performance gains of using deeper pipelines for solving large scale symmetric linear systems with the new CG method variant.

[1]  E. F. DAzevedo,et al.  Reducing communication costs in the conjugate gradient algorithm on distributed memory multiprocessors , 1992 .

[2]  Samuel H. Fuller,et al.  The Future of Computing Performance: Game Over or Next Level? , 2014 .

[3]  Anthony T. Chronopoulos,et al.  Block s‐step Krylov iterative methods , 2010, Numer. Linear Algebra Appl..

[4]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[5]  G. Meurant Computer Solution of Large Linear Systems , 1999 .

[6]  J. Dongarra,et al.  HPCG Benchmark: a New Metric for Ranking High Performance Computing Systems∗ , 2015 .

[7]  Jocelyne Erhel,et al.  A parallel GMRES version for general sparse matrices. , 1995 .

[8]  Jörg Liesen,et al.  On Chebyshev Polynomials of Matrices , 2010, SIAM J. Matrix Anal. Appl..

[9]  Lloyd N. Trefethen,et al.  GMRES/CR and Arnoldi/Lanczos as Matrix Approximation Problems , 2018, SIAM J. Sci. Comput..

[10]  Gérard Meurant Multitasking the conjugate gradient method on the CRAY X-MP/48 , 1987, Parallel Comput..

[11]  William Gropp,et al.  Scalable Non-blocking Preconditioned Conjugate Gradient Methods , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[12]  Hong Zhang,et al.  Hierarchical Krylov and nested Krylov methods for extreme-scale computing , 2014, Parallel Comput..

[13]  Zdenek Strakos,et al.  Accuracy of Two Three-term and Three Two-term Recurrences for Krylov Space Solvers , 2000, SIAM J. Matrix Anal. Appl..

[14]  Erin Carson,et al.  Communication-Avoiding Krylov Subspace Methods in Theory and Practice , 2015 .

[15]  Anne Greenbaum,et al.  Predicting the Behavior of Finite Precision Lanczos and Conjugate Gradient Computations , 2015, SIAM J. Matrix Anal. Appl..

[16]  Marc Casas,et al.  Iteration-fusing conjugate gradient , 2017, ICS.

[17]  Miroslav Tuma,et al.  The Numerical Stability Analysis of Pipelined Conjugate Gradient Methods: Historical Context and Methodology , 2018, SIAM J. Sci. Comput..

[18]  Wim Vanroose,et al.  Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm , 2014, Parallel Comput..

[19]  Mark Hoemmen,et al.  Communication-avoiding Krylov subspace methods , 2010 .

[20]  Anthony T. Chronopoulos,et al.  s-step iterative methods for symmetric linear systems , 1989 .

[21]  Jack Dongarra,et al.  Templates for the Solution of Algebraic Eigenvalue Problems , 2000, Software, environments, tools.

[22]  Sascha M. Schnepp,et al.  Pipelined, Flexible Krylov Subspace Methods , 2015, SIAM J. Sci. Comput..

[23]  Z. Strakos,et al.  Error Estimation in Preconditioned Conjugate Gradients , 2005 .

[24]  Zdenek Strakos,et al.  Composite convergence bounds based on Chebyshev polynomials and finite precision conjugate gradient computations , 2014, Numerical Algorithms.

[25]  Anthony T. Chronopoulos,et al.  Parallel Iterative S-Step Methods for Unsymmetric Linear Systems , 1996, Parallel Comput..

[26]  Z. Strakos,et al.  Krylov Subspace Methods: Principles and Analysis , 2012 .

[27]  H. V. D. Vorst,et al.  Reducing the effect of global communication in GMRES( m ) and CG on parallel distributed memory computers , 1995 .

[28]  Y. Saad,et al.  Practical Use of Some Krylov Subspace Methods for Solving Indefinite and Nonsymmetric Linear Systems , 1984 .

[29]  Sandia Report,et al.  Toward a New Metric for Ranking High Performance Computing Systems , 2013 .

[30]  G. Meurant,et al.  The Lanczos and conjugate gradient algorithms in finite precision arithmetic , 2006, Acta Numerica.

[31]  James Demmel,et al.  Avoiding Communication in Nonsymmetric Lanczos-Based Krylov Subspace Methods , 2013, SIAM J. Sci. Comput..

[32]  Jocelyne Erhel,et al.  Varying the s in Your s-step GMRES , 2018 .

[33]  Zdenek Strakos Effectivity and optimizing of algorithms and programs on the host-computer/array-processor system , 1987, Parallel Comput..

[34]  M. Rozložník,et al.  ON THE NUMERICAL STABILITY ANALYSIS OF PIPELINED KRYLOV SUBSPACE METHODS , 2016 .

[35]  Emmanuel Agullo,et al.  Hard Faults and Soft-Errors: Possible Numerical Remedies in Linear Algebra Solvers , 2016, VECPAR.

[36]  James Demmel,et al.  A Residual Replacement Strategy for Improving the Maximum Attainable Accuracy of s-Step Krylov Subspace Methods , 2014, SIAM J. Matrix Anal. Appl..

[37]  Barry F. Smith,et al.  Domain Decomposition: Parallel Multilevel Methods for Elliptic Partial Differential Equations , 1996 .

[38]  M. Saunders,et al.  Solution of Sparse Indefinite Systems of Linear Equations , 1975 .

[39]  Jack J. Dongarra,et al.  Improving Performance of GMRES by Reducing Communication and Pipelining Global Collectives , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[40]  A. Greenbaum Estimating the Attainable Accuracy of Recursively Computed Residual Methods , 1997, SIAM J. Matrix Anal. Appl..

[41]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[42]  James Demmel,et al.  Parallel numerical linear algebra , 1993, Acta Numerica.

[43]  A. Greenbaum Behavior of slightly perturbed Lanczos and conjugate-gradient recurrences , 1989 .

[44]  Emmanuel Agullo,et al.  Analyzing the Effect of Local Rounding Error Propagation on the Maximal Attainable Accuracy of the Pipelined Conjugate Gradient Method , 2016, SIAM J. Matrix Anal. Appl..

[45]  Z. Strakos,et al.  On error estimation in the conjugate gradient method and why it works in finite precision computations. , 2002 .

[46]  John Shalf,et al.  The International Exascale Software Project roadmap , 2011, Int. J. High Perform. Comput. Appl..

[47]  Wim Vanroose,et al.  The communication-hiding pipelined BiCGstab method for the parallel solution of large unsymmetric linear systems , 2016, Parallel Comput..

[48]  Wim Vanroose,et al.  Hiding Global Communication Latency in the GMRES Algorithm on Massively Parallel Machines , 2013, SIAM J. Sci. Comput..

[49]  Laura Grigori,et al.  Enlarged Krylov Subspace Conjugate Gradient Methods for Reducing Communication , 2016, SIAM J. Matrix Anal. Appl..

[50]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[51]  Anne Greenbaum,et al.  Iterative methods for solving linear systems , 1997, Frontiers in applied mathematics.