Avoiding Communication in the Lanczos Bidiagonalization Routine and Associated Least Squares QR Solver

Abstract : Communication - the movement of data between levels of memory hierarchy or between processors over a network - is the most expensive operation in terms of both time and energy at all scales of computing. Achieving scalable performance in terms of time and energy thus requires a dramatic shift in the field of algorithmic design. Solvers for sparse linear algebra problems, ubiquitous throughout scientific codes, are often the bottlenecks in application performance due to a low computation/communication ratio. In this paper we develop three potential implementations of communication-avoiding Lanczos bidiagonalization algorithms and discuss their different computational requirements. Based on these new algorithms, we also show how to obtain a communication-avoiding LSQR least squares solver.

[1]  Mark Hoemmen,et al.  Communication-avoiding Krylov subspace methods , 2010 .

[2]  Lothar Reichel,et al.  On the generation of Krylov subspace bases , 2012 .

[3]  Sivan Toledo,et al.  Quantitative performance modeling of scientific computations and creating locality in numerical algorithms , 1995 .

[4]  Michael A. Saunders,et al.  LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares , 1982, TOMS.

[5]  Anthony T. Chronopoulos,et al.  s-step iterative methods for symmetric linear systems , 1989 .

[6]  James Demmel,et al.  Accuracy of the s-Step Lanczos Method for the Symmetric Eigenproblem in Finite Precision , 2015, SIAM J. Matrix Anal. Appl..

[7]  Anthony T. Chronopoulos,et al.  Parallel Iterative S-Step Methods for Unsymmetric Linear Systems , 1996, Parallel Comput..

[8]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[9]  W. Karush An iterative method for finding characteristic vectors of a symmetric matrix , 1951 .

[10]  Anthony T. Chronopoulos,et al.  On the efficient implementation of preconditioned s-step conjugate gradient methods on multiprocessors with memory hierarchy , 1989, Parallel Comput..

[11]  Samuel Williams,et al.  s-Step Krylov Subspace Methods as Bottom Solvers for Geometric Multigrid , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[12]  James Demmel,et al.  Avoiding Communication in Nonsymmetric Lanczos-Based Krylov Subspace Methods , 2013, SIAM J. Sci. Comput..

[13]  James Demmel,et al.  Communication lower bounds and optimal algorithms for numerical linear algebra*† , 2014, Acta Numerica.

[14]  J. Demmel,et al.  Avoiding Communication in Computing Krylov Subspaces , 2007 .

[15]  James Demmel,et al.  Avoiding communication in sparse matrix computations , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[16]  Marghoob Mohiyuddin,et al.  Tuning Hardware and Software for Multiprocessors , 2012 .

[17]  Dennis Gannon,et al.  On the Impact of Communication Complexity on the Design of Parallel Numerical Algorithms , 1984, IEEE Transactions on Computers.

[18]  J. Cullum,et al.  A block Lanczos algorithm for computing the q algebraically largest eigenvalues and a corresponding eigenspace of large, sparse, real symmetric matrices , 1974, CDC 1974.

[19]  H. Walker,et al.  Note on a Householder implementation of the GMRES method , 1986 .

[20]  Gene H. Golub,et al.  Calculating the singular values and pseudo-inverse of a matrix , 2007, Milestones in Matrix Computation.

[21]  John Van Rosendale Minimizing Inner Product Data Dependencies in Conjugate Gradient Iteration , 1983, ICPP.

[22]  Gene H. Golub,et al.  Matrix computations , 1983 .

[23]  Eric de Sturler,et al.  A Performance Model for Krylov Subspace Methods on Mesh-Based Parallel Computers , 1996, Parallel Comput..

[24]  James Demmel,et al.  Minimizing communication in sparse matrix solvers , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.