Pipelined, Flexible Krylov Subspace Methods

We present variants of the conjugate gradient (CG), conjugate residual (CR), and generalized minimal residual (GMRES) methods which are both pipelined and flexible. These allow computation of inner products and norms to be overlapped with operator and nonlinear or nondeterministic preconditioner application. The methods are hence aimed at hiding network latencies and synchronizations which can become computational bottlenecks in Krylov methods on extreme-scale systems or in the strong-scaling limit. The new variants are not arithmetically equivalent to their base flexible Krylov methods, but are chosen to be similarly performant in a realistic use case, the application of strong nonlinear preconditioners to large problems which require many Krylov iterations. We provide scalable implementations of our methods as contributions to the PETSc package and demonstrate their effectiveness with practical examples derived from models of mantle convection and lithospheric dynamics with heterogeneous viscosity struc...

[1]  O. Axelsson,et al.  A black box generalized conjugate gradient solver with inner iterations and variable-step preconditioning , 1991 .

[2]  Mark Hoemmen,et al.  Communication-avoiding Krylov subspace methods , 2010 .

[3]  Matthew G. Knepley,et al.  A stochastic performance model for pipelined Krylov methods , 2016, Concurr. Comput. Pract. Exp..

[4]  A. J. Wathen,et al.  Preconditioning , 2015, Acta Numerica.

[5]  Y. Saad,et al.  GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems , 1986 .

[6]  D. Luenberger The Conjugate Residual Method for Constrained Minimization Problems , 1970 .

[7]  Wim Vanroose,et al.  Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm , 2014, Parallel Comput..

[8]  Message P Forum,et al.  MPI: A Message-Passing Interface Standard , 1994 .

[9]  William J. Dally,et al.  Technology-Driven, Highly-Scalable Dragonfly Topology , 2008, 2008 International Symposium on Computer Architecture.

[10]  Gerhard Wellein,et al.  Asynchronous MPI for the Masses , 2013, ArXiv.

[11]  S. Eisenstat,et al.  Variational Iterative Methods for Nonsymmetric Systems of Linear Equations , 1983 .

[12]  Barry F. Smith,et al.  Domain Decomposition: Parallel Multilevel Methods for Elliptic Partial Differential Equations , 1996 .

[13]  Hong Zhang,et al.  Hierarchical Krylov and nested Krylov methods for extreme-scale computing , 2014, Parallel Comput..

[14]  J. Shewchuk An Introduction to the Conjugate Gradient Method Without the Agonizing Pain , 1994 .

[15]  John Shalf,et al.  Exascale Computing Technology Challenges , 2010, VECPAR.

[16]  Torsten Hoefler,et al.  Message progression in parallel computing - to thread or not to thread? , 2008, 2008 IEEE International Conference on Cluster Computing.

[17]  O. Axelsson A generalized conjugate gradient, least square method , 1987 .

[18]  Wim Vanroose,et al.  Hiding Global Communication Latency in the GMRES Algorithm on Massively Parallel Machines , 2013, SIAM J. Sci. Comput..

[19]  Yousef Saad,et al.  A Flexible Inner-Outer Preconditioned GMRES Algorithm , 1993, SIAM J. Sci. Comput..

[20]  H. D. Huskey,et al.  NOTES ON THE SOLUTION OF ALGEBRAIC LINEAR SIMULTANEOUS EQUATIONS , 1948 .

[21]  Torsten Hoefler,et al.  Accurately measuring collective operations at massive scale , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[22]  Dave A. May,et al.  A scalable, matrix-free multigrid preconditioner for finite element discretizations of heterogeneous Stokes flow , 2015 .

[23]  Torsten Hoefler,et al.  Scientific Benchmarking of Parallel Computing Systems Twelve ways to tell the masses when reporting performance results , 2017 .

[24]  Yvan Notay Flexible Conjugate Gradients , 2000, SIAM J. Sci. Comput..

[25]  Matthew G. Knepley,et al.  Composing Scalable Nonlinear Algebraic Solvers , 2015, SIAM Rev..

[26]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[27]  Anthony T. Chronopoulos,et al.  s-step iterative methods for symmetric linear systems , 1989 .

[28]  Xiao-Chuan Cai,et al.  A Restricted Additive Schwarz Preconditioner for General Sparse Linear Systems , 1999, SIAM J. Sci. Comput..

[29]  Karl Rupp,et al.  Pipelined Iterative Solvers with Kernel Fusion for Graphics Processing Units , 2016, ACM Trans. Math. Softw..

[30]  William Gropp,et al.  Reproducible Measurements of MPI Performance Characteristics , 1999, PVM/MPI.

[31]  C. Dohrmann,et al.  A stabilized finite element method for the Stokes problem based on polynomial pressure projections , 2004 .

[32]  Jed Brown,et al.  pTatin3D: High-Performance Methods for Long-Term Lithospheric Dynamics , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[33]  James Demmel,et al.  Minimizing communication in sparse matrix solvers , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[34]  Gene H. Golub,et al.  Inexact Preconditioned Conjugate Gradient Method with Inner-Outer Iteration , 1999, SIAM J. Sci. Comput..

[35]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[36]  Gene H. Golub,et al.  Some History of the Conjugate Gradient and Lanczos Algorithms: 1948-1976 , 1989, SIAM Rev..