论文信息 - The Communication-Hiding Conjugate Gradient Method with Deep Pipelines

The Communication-Hiding Conjugate Gradient Method with Deep Pipelines

Krylov subspace methods are among the most efficient present-day solvers for large scale linear algebra problems. Nevertheless, classic Krylov subspace method algorithms do not scale well on massively parallel hardware due to the synchronization bottlenecks induced by the computation of dot products throughout the algorithms. Communication-hiding pipelined Krylov subspace methods offer increased parallel scalability. One of the first published methods in this class is the pipelined Conjugate Gradient method (p-CG), which exhibits increased speedups on parallel machines. This is achieved by overlapping the time-consuming global communication phase with useful (independent) computations such as spmvs, hence reducing the impact of global communication as a synchronization bottleneck and avoiding excessive processor idling. However, on large numbers of processors the time spent in the global communication phase can be much higher than the time required for computing a single spmv. This work extends the pipelined CG method to deeper pipelines, which allows further scaling when the global communication phase is the dominant time-consuming factor. By overlapping the global all-to-all reduction phase in each CG iteration with the next l spmvs (pipelining), the method is able to hide communication latency behind computational work. The derivation of the p(l)-CG algorithm is based on the existing p(l)-GMRES method. Moreover, a number of theoretical and implementation properties of the p(l)-CG method are presented, including a preconditioned version of the algorithm. Experimental results are presented to demonstrate the possible performance gains of using deeper pipelines for solving large scale symmetric linear systems with the new CG method variant.

[1] E. F. DAzevedo,et al. Reducing communication costs in the conjugate gradient algorithm on distributed memory multiprocessors , 1992 .

[2] Samuel H. Fuller,et al. The Future of Computing Performance: Game Over or Next Level? , 2014 .

[3] Anthony T. Chronopoulos,et al. Block s‐step Krylov iterative methods , 2010, Numer. Linear Algebra Appl..

[4] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[5] G. Meurant. Computer Solution of Large Linear Systems , 1999 .

[6] J. Dongarra,et al. HPCG Benchmark: a New Metric for Ranking High Performance Computing Systems∗ , 2015 .

[7] Jocelyne Erhel,et al. A parallel GMRES version for general sparse matrices. , 1995 .

[8] Jörg Liesen,et al. On Chebyshev Polynomials of Matrices , 2010, SIAM J. Matrix Anal. Appl..

[9] Lloyd N. Trefethen,et al. GMRES/CR and Arnoldi/Lanczos as Matrix Approximation Problems , 2018, SIAM J. Sci. Comput..

[10] Gérard Meurant. Multitasking the conjugate gradient method on the CRAY X-MP/48 , 1987, Parallel Comput..

[11] William Gropp,et al. Scalable Non-blocking Preconditioned Conjugate Gradient Methods , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[12] Hong Zhang,et al. Hierarchical Krylov and nested Krylov methods for extreme-scale computing , 2014, Parallel Comput..

[13] Zdenek Strakos,et al. Accuracy of Two Three-term and Three Two-term Recurrences for Krylov Space Solvers , 2000, SIAM J. Matrix Anal. Appl..

[14] Erin Carson,et al. Communication-Avoiding Krylov Subspace Methods in Theory and Practice , 2015 .

[15] Anne Greenbaum,et al. Predicting the Behavior of Finite Precision Lanczos and Conjugate Gradient Computations , 2015, SIAM J. Matrix Anal. Appl..

[16] Marc Casas,et al. Iteration-fusing conjugate gradient , 2017, ICS.

[17] Miroslav Tuma,et al. The Numerical Stability Analysis of Pipelined Conjugate Gradient Methods: Historical Context and Methodology , 2018, SIAM J. Sci. Comput..

[18] Wim Vanroose,et al. Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm , 2014, Parallel Comput..

[19] Mark Hoemmen,et al. Communication-avoiding Krylov subspace methods , 2010 .

[20] Anthony T. Chronopoulos,et al. s-step iterative methods for symmetric linear systems , 1989 .

[21] Jack Dongarra,et al. Templates for the Solution of Algebraic Eigenvalue Problems , 2000, Software, environments, tools.

[22] Sascha M. Schnepp,et al. Pipelined, Flexible Krylov Subspace Methods , 2015, SIAM J. Sci. Comput..

[23] Z. Strakos,et al. Error Estimation in Preconditioned Conjugate Gradients , 2005 .