Minimizing Communication for Eigenproblems and the Singular Value Decomposition

Algorithms have two costs: arithmetic and communication. The latter represents the cost of moving data, either between levels of a memory hierarchy, or between processors over a network. Communication often dominates arithmetic and represents a rapidly increasing proportion of the total cost, so we seek algorithms that minimize communication. In [4] lower bounds were presented on the amount of communication required for essentially all O(n 3 )-like algorithms for linear algebra, including eigenvalue problems and the SVD. Conventional algorithms, including those currently implemented in (Sca)LAPACK, perform asymptotically more communication than these lower bounds require. In this paper we present parallel and sequential eigenvalue algorithms (for pencils, nonsymmetric matrices, and symmetric matrices) and SVD algorithms that do attain these lower bounds, and analyze their convergence and communication costs.

[1]  James Demmel,et al.  The generalized Schur decomposition of an arbitrary pencil A–λB—robust software with error bounds and applications. Part II: software and applications , 1993, TOMS.

[2]  V. Kublanovskaya,et al.  An approach to solving the spectral problem of A-λB , 1983 .

[3]  A. Malyshev Computing invariant subspaces of a regular linear pencil of matrices , 1989 .

[4]  D. Sorensen,et al.  LAPACK Working Note No. 2: Block reduction of matrices to condensed forms for eigenvalue computations , 1987 .

[5]  Ya Yan Lu,et al.  Eigenvalues of the Laplacian through boundary integral equations , 1991 .

[6]  James Demmel,et al.  Minimizing Communication in Numerical Linear Algebra , 2009, SIAM J. Matrix Anal. Appl..

[7]  Ed Anderson,et al.  LAPACK users' guide - [release 1.0] , 1992 .

[8]  James Demmel,et al.  Communication avoiding successive band reduction , 2012, PPoPP '12.

[9]  H. Rutishauser On jacobi rotation patterns , 1963 .

[10]  Greg Henry The Shifted Hessenberg System Solve Computation , 1994 .

[11]  Dror Irony,et al.  Communication lower bounds for distributed-memory matrix multiplication , 2004, J. Parallel Distributed Comput..

[12]  S. Godunov,et al.  Circular dichotomy of the spectrum of a matrix , 1988 .

[13]  H. T. Kung,et al.  I/O complexity: The red-blue pebble game , 1981, STOC '81.

[14]  J. L. Howland The sign matrix and the separation of matrix eigenvalues , 1983 .

[15]  G. Stewart On graded QR decompositions of products of matrices , 1994 .

[16]  Jack Dongarra,et al.  ScaLAPACK Users' Guide , 1987 .

[17]  James Demmel,et al.  Fast linear algebra is stable , 2006, Numerische Mathematik.

[18]  Matemática,et al.  Society for Industrial and Applied Mathematics , 2010 .

[19]  L. Trefethen,et al.  Eigenvalues and pseudo-eigenvalues of Toeplitz matrices , 1992 .

[20]  Xiaobai Sun,et al.  Parallel tridiagonalization through two-step band reduction , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.

[21]  James Demmel,et al.  Communication-optimal Parallel and Sequential QR and LU Factorizations , 2008, SIAM J. Sci. Comput..

[22]  Alan Edelman,et al.  The dimension of matrices (matrix pencils) with given Jordan (Kronecker) canonical forms , 1995 .

[23]  L. Auslander,et al.  On parallelizable eigensolvers , 1992 .

[24]  James Demmel,et al.  The generalized Schur decomposition of an arbitrary pencil A–λB—robust software with error bounds and applications. Part I: theory and algorithms , 1993, TOMS.

[25]  Marc Snir,et al.  GETTING UP TO SPEED THE FUTURE OF SUPERCOMPUTING , 2004 .

[26]  A. Malyshev Parallel Algorithm for Solving Some Spectral Problems of Linear Algebra , 1993 .

[27]  G. Stewart Gershgorin theory for the generalized eigenvalue problem , 1975 .

[28]  J. Demmel,et al.  An inverse free parallel spectral divide and conquer algorithm for nonsymmetric eigenproblems , 1997 .

[29]  Per Christian Hansen,et al.  Some Applications of the Rank Revealing QR Factorization , 1992, SIAM J. Sci. Comput..

[30]  C. Bischof Incremental condition estimation , 1990 .

[31]  Christian H. Bischof,et al.  A framework for symmetric band reduction , 2000, TOMS.

[32]  Enrique S. Quintana-Ortí,et al.  Specialized Spectral Division Algorithms for Generalized Eigenproblems Via the Inverse-Free Iteration , 2006, PARA.

[33]  Ming Gu,et al.  Efficient Algorithms for Computing a Strong Rank-Revealing QR Factorization , 1996, SIAM J. Sci. Comput..

[34]  Bruno Lang,et al.  A Parallel Algorithm for Reducing Symmetric Banded Matrices to Tridiagonal Form , 1993, SIAM J. Sci. Comput..

[35]  Christian H. Bischof,et al.  Algorithm 807: The SBR Toolbox—software for successive band reduction , 2000, TOMS.

[36]  Xiaobai Sun,et al.  The PRISM project: infrastructure and algorithms for parallel eigensolvers , 1993, Proceedings of Scalable Parallel Libraries Conference.

[37]  L. Trefethen,et al.  Spectra and Pseudospectra , 2020 .

[38]  Jeremy D. Frens,et al.  QR factorization with Morton-ordered quadtree matrices for memory re-use and parallelism , 2003, PPoPP '03.

[39]  James Demmel,et al.  Minimizing Communication in Linear Algebra , 2009, ArXiv.

[40]  Ed Anderson,et al.  LAPACK Users' Guide , 1995 .

[41]  S. Godunov Problem of the dichotomy of the spectrum of a matrix , 1986 .

[42]  Inderjit S. Dhillon,et al.  The design and implementation of the MRRR algorithm , 2006, TOMS.

[43]  Robert A. van de Geijn,et al.  SUMMA: scalable universal matrix multiplication algorithm , 1995, Concurr. Pract. Exp..

[44]  Paul Willems,et al.  On MR3-type Algorithms for the Tridiagonal Symmetric Eigenproblem and the Bidiagonal SVD , 2018 .

[45]  J. D. Roberts,et al.  Linear model reduction and solution of the algebraic Riccati equation by use of the sign function , 1980 .

[46]  Lars Karlsson,et al.  Parallel two-stage reduction to Hessenberg form using dynamic scheduling on shared-memory architectures , 2011, Parallel Comput..

[47]  Jack J. Dongarra,et al.  Scheduling two-sided transformations using tile algorithms on multicore architectures , 2010, Sci. Program..

[48]  James Demmel,et al.  ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance , 1995, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[49]  Inderjit S. Dhillon,et al.  Orthogonal Eigenvectors and Relative Gaps , 2003, SIAM J. Matrix Anal. Appl..

[50]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[51]  H. Schwarz Tridiagonalization of a symetric band matrix , 1968 .

[52]  Matteo Frigo,et al.  Cache-oblivious algorithms , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[53]  Brian D. Sutton,et al.  The stochastic operator approach to random matrix theory , 2005 .