Avoiding Communication in Nonsymmetric Lanczos-Based Krylov Subspace Methods

Krylov subspace methods are iterative methods for solving large, sparse linear systems and eigenvalue problems in a variety of scientific domains. On modern computer architectures, communication, or movement of data, takes much longer than the equivalent amount of computation. Classical formulations of Krylov subspace methods require data movement in each iteration, creating a performance bottleneck, and thus increasing runtime. This motivated $s$-step, or communication-avoiding, Krylov subspace methods, which only perform data movement every $O(s)$ iterations. We present new communication-avoiding Krylov subspace methods, CA-BICG and CA-BICGSTAB. We are the first to provide derivations of these methods. For both sequential and parallel implementations, our methods reduce data movement by a factor of $O(s)$ versus the classical algorithms. We implement various polynomial bases and perform convergence experiments to enable comparison with the classical algorithm. We discuss recent results in improving both...

[1]  H. Walker,et al.  Note on a Householder implementation of the GMRES method , 1986 .

[2]  L. Reichel,et al.  A Newton basis GMRES implementation , 1994 .

[3]  Y. Saad,et al.  Practical Use of Polynomial Preconditionings for the Conjugate Gradient Method , 1985 .

[4]  Zdenek Strakos,et al.  Accuracy of Two Three-term and Three Two-term Recurrences for Krylov Space Solvers , 2000, SIAM J. Matrix Anal. Appl..

[5]  H. Walker Implementation of the GMRES method using householder transformations , 1988 .

[6]  W. Joubert,et al.  Parallelizable restarted iterative methods for nonsymmetric linear systems. part I: Theory , 1992 .

[7]  Anthony T. Chronopoulos,et al.  s-step iterative methods for symmetric linear systems , 1989 .

[8]  John Shalf,et al.  SEJITS: Getting Productivity and Performance With Selective Embedded JIT Specialization , 2010 .

[9]  Ümit V. Çatalyürek,et al.  Hypergraph-Partitioning-Based Decomposition for Parallel Sparse-Matrix Vector Multiplication , 1999, IEEE Trans. Parallel Distributed Syst..

[10]  Qiang Ye,et al.  Residual Replacement Strategies for Krylov Subspace Iterative Methods for the Convergence of True Residuals , 2000, SIAM J. Sci. Comput..

[11]  Sivan Toledo,et al.  Quantitative performance modeling of scientific computations and creating locality in numerical algorithms , 1995 .

[12]  James Demmel,et al.  Minimizing communication in sparse matrix solvers , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[13]  Mark Hoemmen,et al.  Communication-avoiding Krylov subspace methods , 2010 .

[14]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[15]  Edith Cohen,et al.  Estimating the size of the transitive closure in linear time , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[16]  Martin H. Gutknecht,et al.  Look-Ahead Procedures for Lanczos-Type Product Methods Based on Three-Term Lanczos Recurrences , 2000, SIAM J. Matrix Anal. Appl..

[17]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[18]  Anthony T. Chronopoulos,et al.  An efficient nonsymmetric Lanczos method on parallel vector computers , 1992 .

[19]  Marghoob Mohiyuddin,et al.  Tuning Hardware and Software for Multiprocessors , 2012 .

[20]  Graham F. Carey,et al.  Parallelizable Restarted Iterative Methods for Nonsymmetric Linear Systems , 1991, PPSC.

[21]  Henk A. van der Vorst,et al.  Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems , 1992, SIAM J. Sci. Comput..

[22]  Jianlin Xia,et al.  Fast algorithms for hierarchically semiseparable matrices , 2010, Numer. Linear Algebra Appl..

[23]  James Demmel,et al.  Communication-optimal Parallel and Sequential QR and LU Factorizations , 2008, SIAM J. Sci. Comput..

[24]  Lothar Reichel,et al.  On the generation of Krylov subspace bases , 2012 .

[25]  Gene H. Golub,et al.  The block Lanczos method for computing eigenvalues , 2007, Milestones in Matrix Computation.

[26]  Qiang Ye,et al.  Analysis of the finite precision bi-conjugate gradient algorithm for nonsymmetric linear systems , 2000, Math. Comput..

[27]  L. Reichel Newton interpolation at Leja points , 1990 .

[28]  J. Vanrosendale,et al.  Minimizing inner product data dependencies in conjugate gradient iteration , 1983 .

[29]  Jack Dongarra,et al.  A Test Matrix Collection for Non-Hermitian Eigenvalue Problems , 1997 .

[30]  Martin H. Gutknecht,et al.  Lanczos-type solvers for nonsymmetric linear systems of equations , 1997, Acta Numerica.

[31]  J. Demmel,et al.  Avoiding Communication in Computing Krylov Subspaces , 2007 .

[32]  Edith Cohen,et al.  Size-Estimation Framework with Applications to Transitive Closure and Reachability , 1997, J. Comput. Syst. Sci..

[33]  Zhishun A. Liu,et al.  A Look Ahead Lanczos Algorithm for Unsymmetric Matrices , 1985 .

[34]  Alan LaMielle,et al.  Computer Science Technical Report Enabling Code Generation within the Sparse Polyhedral Framework Enabling Code Generation within the Sparse Polyhedral Framework , 2010 .

[35]  Shoaib Kamil,et al.  Auto-tuning the Matrix Powers Kernel with SEJITS , 2012, VECPAR.