Avoiding Communication in Two-Sided Krylov Subspace Methods

Abstract : The cost of an algorithm is a function of both computation, the number of arithmetic operations performed, and communication, the amount of data movement. Communication cost encapsulates both data movement between levels of the memory hierarchy and between processors, and the number of messages in which the data is sent. In terms of performance, communication costs are much greater than computation costs on modern computer architectures, and the gap is only expected to widen in future systems. Therefore, in order to increase the performance of an algorithm, we must turn to strategies to minimize communication rather than try to decrease the number of arithmetic operations. We call this a communication-avoiding (CA) approach to algorithmic design.

[1]  Anthony T. Chronopoulos,et al.  s-step iterative methods for symmetric linear systems , 1989 .

[2]  Lothar Reichel,et al.  On the generation of Krylov subspace bases , 2012 .

[3]  David J. Evans,et al.  Models of Asynchronous Parallel Matrix Multisplitting Relaxed Iterations , 1995, Parallel Comput..

[4]  John Shalf,et al.  SEJITS: Getting Productivity and Performance With Selective Embedded JIT Specialization , 2010 .

[5]  D. O’Leary The block conjugate gradient algorithm and related methods , 1980 .

[6]  Sivan Toledo,et al.  Quantitative performance modeling of scientific computations and creating locality in numerical algorithms , 1995 .

[7]  Graham F. Carey,et al.  Parallelizable Restarted Iterative Methods for Nonsymmetric Linear Systems , 1991, PPSC.

[8]  Sivan Toledo,et al.  Efficient Out-of-Core Algorithms for Linear Relaxation Using Blocking Covers , 1997, J. Comput. Syst. Sci..

[9]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[10]  John Van Rosendale Minimizing Inner Product Data Dependencies in Conjugate Gradient Iteration , 1983, ICPP.

[11]  G. Meurant The block preconditioned conjugate gradient method on vector computers , 1984 .

[12]  Henk A. van der Vorst,et al.  Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems , 1992, SIAM J. Sci. Comput..

[13]  Gerard L. G. Sleijpen,et al.  Reliable updated residuals in hybrid Bi-CG methods , 1996, Computing.

[14]  Y. Saad,et al.  Practical Use of Polynomial Preconditionings for the Conjugate Gradient Method , 1985 .

[15]  Gene H. Golub,et al.  The block Lanczos method for computing eigenvalues , 2007, Milestones in Matrix Computation.

[16]  Elizabeth R. Jessup,et al.  On Improving Linear Solver Performance: A Block Variant of GMRES , 2005, SIAM J. Sci. Comput..

[17]  L. Reichel Newton interpolation at Leja points , 1990 .

[18]  Katherine Yelick,et al.  OSKI: A library of automatically tuned sparse matrix kernels , 2005 .

[19]  Mark Hoemmen,et al.  Communication-avoiding Krylov subspace methods , 2010 .

[20]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[21]  R. Fletcher Conjugate gradient methods for indefinite systems , 1976 .

[22]  L. Elsner,et al.  Models of parallel chaotic iteration methods , 1988 .

[23]  Martin H. Gutknecht,et al.  Lanczos-type solvers for nonsymmetric linear systems of equations , 1997, Acta Numerica.

[24]  P. Sonneveld CGS, A Fast Lanczos-Type Solver for Nonsymmetric Linear systems , 1989 .

[25]  J. Demmel,et al.  Avoiding Communication in Computing Krylov Subspaces , 2007 .

[26]  Gérard M. Baudet,et al.  Asynchronous Iterative Methods for Multiprocessors , 1978, JACM.

[27]  James Demmel,et al.  Minimizing communication in sparse matrix solvers , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[28]  James Demmel,et al.  Minimizing Communication in Numerical Linear Algebra , 2009, SIAM J. Matrix Anal. Appl..

[29]  D. Hut A Newton Basis Gmres Implementation , 1991 .

[30]  W. Deren On the convergence of the parallel multisplitting AOR algorithm , 1991 .

[31]  Alan LaMielle,et al.  Computer Science Technical Report Enabling Code Generation within the Sparse Polyhedral Framework Enabling Code Generation within the Sparse Polyhedral Framework , 2010 .

[32]  Richard R. Underwood An iterative block Lanczos method for the solution of large sparse symmetric eigenproblems , 1975 .

[33]  Andrés Tomás,et al.  Parallel Arnoldi eigensolvers with enhanced scalability via global communications rearrangement , 2007, Parallel Comput..

[34]  Ankit Jain pOSKI : An Extensible Autotuning Framework to Perform Optimized SpMVs on Multicore Architectures , 2008 .

[35]  Sivan Toledo,et al.  Efficient out-of-core algorithms for linear relaxation using blocking covers , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[36]  J. Cullum,et al.  A block Lanczos algorithm for computing the q algebraically largest eigenvalues and a corresponding eigenspace of large, sparse, real symmetric matrices , 1974, CDC 1974.

[37]  H. Walker,et al.  Note on a Householder implementation of the GMRES method , 1986 .

[38]  Zdenek Strakos,et al.  Accuracy of Two Three-term and Three Two-term Recurrences for Krylov Space Solvers , 2000, SIAM J. Matrix Anal. Appl..

[39]  H. T. Kung,et al.  I/O complexity: The red-blue pebble game , 1981, STOC '81.

[40]  H. Walker Implementation of the GMRES method using householder transformations , 1988 .

[41]  W. Joubert,et al.  Parallelizable restarted iterative methods for nonsymmetric linear systems. part I: Theory , 1992 .

[42]  P. Comba,et al.  Part I. Theory , 2007 .

[43]  Richard Barrett,et al.  Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods , 1994, Other Titles in Applied Mathematics.