Low-Synch Gram-Schmidt with Delayed Reorthogonalization for Krylov Solvers

The parallel strong-scaling of Krylov iterative methods is largely determined by the number of global reductions required at each iteration. The GMRES and Krylov-Schur algorithms compute the Arnoldi expansion for nonsymmetric matrices. The underlying algorithm is “left-looking” and processes one column at a time. Thus, at least one global reduction is required per iteration. The usual method for generating the orthogonal Krylov basis for the Krylov-Schur algorithm is classical Gram Schmidt applied twice (CGS2), requiring three global reductions per iteration. A new variant of CGS2 that requires only one reduction per iteration is applied to the Arnoldi-QR iteration. Strong-scaling results are presented for finding eigenvalue-pairs of nonsymmetric matrices. A preliminary attempt to derive a similar parallel method (one reduction per Arnoldi iteration with a robust orthogonalization scheme) was presented by Hernandez et al. [1]. Unlike our approach, their method is not forward stable for eigenvalues.

[1]  Luke N. Olson,et al.  Node-Aware Improvements to Allreduce , 2019, 2019 IEEE/ACM Workshop on Exascale MPI (ExaMPI).

[2]  Christopher C. Paige,et al.  The Effects of Loss of Orthogonality on Large Scale Numerical Computations , 2018, ICCSA.

[3]  Miroslav Rozlozník,et al.  Modified Gram-Schmidt (MGS), Least Squares, and Backward Stability of MGS-GMRES , 2006, SIAM J. Matrix Anal. Appl..

[4]  R. Kahn The Effects of a Loss , 1989 .

[5]  Y. Saad,et al.  GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems , 1986 .

[6]  T. Manteuffel Adaptive procedure for estimating parameters for the nonsymmetric Tchebychev iteration , 1978 .

[7]  Sivasankaran Rajamanickam,et al.  Amesos2 and Belos: Direct and iterative solvers for large sparse linear systems , 2012, Sci. Program..

[8]  Luc Giraud,et al.  On the Influence of the Orthogonalization Scheme on the Parallel Performance of GMRES , 1998, Euro-Par.

[9]  Shreyas Ananthan,et al.  Low synchronization Gram–Schmidt and generalized minimal residual algorithms , 2020, Numer. Linear Algebra Appl..

[10]  Ichitaro Yamazaki,et al.  Low-synchronization orthogonalization schemes for s-step and pipelined Krylov solvers in Trilinos , 2020, PPSC.

[11]  Andrés Tomás,et al.  Parallel Arnoldi eigensolvers with enhanced scalability via global communications rearrangement , 2007, Parallel Comput..

[12]  Tamara G. Kolda,et al.  An overview of the Trilinos project , 2005, TOMS.

[13]  S. K. Kim,et al.  An Efficient Parallel Algorithm for Extreme Eigenvalues of Sparse Nonsymmetric Matrices , 1992 .

[14]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[15]  Julien Langou,et al.  A note on the error analysis of classical Gram–Schmidt , 2006, Numerische Mathematik.

[16]  J. E. Román,et al.  Krylov-Schur Methods in SLEPc , 2015 .

[17]  Miroslav Rozlozník,et al.  An overview of block Gram-Schmidt methods and their stability properties , 2020, ArXiv.

[18]  G. W. Stewart,et al.  A Krylov-Schur Algorithm for Large Eigenproblems , 2001, SIAM J. Matrix Anal. Appl..

[19]  Julien Langou,et al.  Rounding error analysis of the classical Gram-Schmidt orthogonalization process , 2005, Numerische Mathematik.