A Parallel Implementation of the Nonsymmetric QR Algorithm for Distributed Memory Architectures

One approach to solving the nonsymmetric eigenvalue problem in parallel is to parallelize the QR algorithm. Not long ago, this was widely considered to be a hopeless task. Recent efforts have led to significant advances, although the methods proposed up to now have suffered from scalability problems. This paper discusses an approach to parallelizing the QR algorithm that greatly improves scalability. A theoretical analysis indicates that the algorithm is ultimately not scalable, but the nonscalability does not become evident until the matrix dimension is enormous. Experiments on the Intel Paragon system, the IBM SP2 supercomputer, the SGI Origin 2000, and the Intel ASCI Option Red supercomputer are reported.

[1]  Al Geist,et al.  Finding eigenvalues and eigenvectors of unsymmetric matrices using a distributed-memory multiprocessor , 1990, Parallel Comput..

[2]  Steven Huss-Lederman,et al.  A Parallelizable Eigensolver for Real Diagonalizable Matrices with Real Eigenvalues , 1997, SIAM J. Sci. Comput..

[3]  Gregory Mark Henry Improving data re-use in eigenvalue-related computations , 1994 .

[4]  Greg Henry The Shifted Hessenberg System Solve Computation , 1994 .

[5]  Robert A. van de Geijn,et al.  Reduction to condensed form for the eigenvalue problem on distributed memory architectures , 1992, Parallel Comput..

[6]  James Demmel,et al.  Design of a Parallel Nonsymmetric Eigenroutine Toolbox, Part I , 1993, PPSC.

[7]  R. Pavani,et al.  A parallel algorithm for the symmetric eigenvalue problem , 1996 .

[8]  David S. Watkins,et al.  Fundamentals of matrix computations , 1991 .

[9]  B. S. Garbow,et al.  Matrix Eigensystem Routines — EISPACK Guide , 1974, Lecture Notes in Computer Science.

[10]  Jack Dongarra,et al.  ScaLAPACK Users' Guide , 1987 .

[11]  Greg Henry A Parallel Unsymmetric Inverse Iteration Solver , 1995, PPSC.

[12]  Patricia J. Eberlein,et al.  On the Schur Decomposition of a Matrix for Parallel Computation , 1985, IEEE Transactions on Computers.

[13]  Bo Kågström,et al.  GEMM-Based Level-3 BLAS , 1991 .

[14]  J. G. F. Francis,et al.  The QR Transformation A Unitary Analogue to the LR Transformation - Part 1 , 1961, Comput. J..

[15]  R. C. Whaley,et al.  LAPACK Working Note 94: A User''s Guide to the BLACS v1.0 , 1995 .

[16]  E. Jessup A case against a divide and conquer approach to the nonsymmetric eigenvalue problem , 1993 .

[17]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[18]  David S. Watkins,et al.  The transmission of shifts and shift blurring in the QR algorithm , 1996 .

[19]  Anshul Gupta,et al.  On the scalability of FFT on parallel computers , 1990, [1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation.

[20]  Jack J. Dongarra,et al.  A Parallel Algorithm for the Nonsymmetric Eigenvalue Problem , 1993, SIAM J. Sci. Comput..

[21]  Robert A. van de Geijn,et al.  Parallelizing the QR Algorithm for the Unsymmetric Algebraic Eigenvalue Problem: Myths and Reality , 1996, SIAM J. Sci. Comput..

[22]  R. Byers Numerical Stability and Instability in Matrix Sign Function Based Algorithms , 1986 .

[23]  James Demmel,et al.  On a Block Implementation of Hessenberg Multishift QR Iteration , 1989, Int. J. High Speed Comput..

[24]  Robert A. van de Geijn,et al.  Storage Schemes for Parallel Eigenvalue Algorithms , 1988 .

[25]  J. Demmel,et al.  An inverse free parallel spectral divide and conquer algorithm for nonsymmetric eigenproblems , 1997 .

[26]  Jack Dongarra,et al.  PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing , 1995 .

[27]  Gene H. Golub,et al.  Matrix computations , 1983 .

[28]  R. van de Geijn,et al.  A look at scalable dense linear algebra libraries , 1992, Proceedings Scalable High Performance Computing Conference SHPCC-92..

[29]  Robert A. van de Geijn,et al.  Deferred Shifting Schemes for Parallel QR Methods , 1993, SIAM J. Matrix Anal. Appl..

[30]  Ed Anderson,et al.  LAPACK Users' Guide , 1995 .

[31]  Bruce Hendrickson,et al.  The Torus-Wrap Mapping for Dense Matrix Calculations on Massively Parallel Computers , 1994, SIAM J. Sci. Comput..

[32]  Jack J. Dongarra,et al.  A set of level 3 basic linear algebra subprograms , 1990, TOMS.

[33]  D. Sorensen,et al.  Block reduction of matrices to condensed forms for eigenvalue computations , 1990 .

[34]  James Demmel,et al.  Modeling the benefits of mixed data and task parallelism , 1995, SPAA '95.

[35]  R. C. Whaley,et al.  LAPACK Working Note 73: Basic Linear Algebra Communication Subprograms: Analysis and Implementation Across Multiple Parallel Architectures , 1994 .

[36]  David S. Watkins,et al.  Convergence of algorithms of decomposition type for the eigenvalue problem , 1991 .

[37]  Greg Henry Improving the Unsymmetric Parallel QR Algorithm on Vector Machines , 1993, PPSC.

[38]  James Demmel,et al.  The Spectral Decomposition of Nonsymmetric Matrices on Distributed Memory Parallel Computers , 1997, SIAM J. Sci. Comput..

[39]  David S. Watkins,et al.  Shifting Strategies for the Parallel QR Algorithm , 1994, SIAM J. Sci. Comput..

[40]  G. W. Stewart,et al.  A parallel implementation of the QR-algorithm , 1987, Parallel Comput..

[41]  G. A. Geist,et al.  Finding eigenvalues and eigenvectors of unsymmetric matrices using a hypercube multiprocessor , 1989, C3P.

[42]  Daniel Boley,et al.  A parallel QR algorithm for the nonsymmetric eigenvalue problem , 1989 .

[43]  Jack J. Dongarra,et al.  Algorithm 710: FORTRAN subroutines for computing the eigenvalues and eigenvectors of a general matrix by reduction to general tridiagonal form , 1990, TOMS.

[44]  Vipin Kumar,et al.  The Scalability of FFT on Parallel Computers , 1993, IEEE Trans. Parallel Distributed Syst..

[45]  Eleanor Chu,et al.  New Distributed-Memory Parallel Algorithms for Solving Nonsymmetric Eigenvalue Problems , 1995, PPSC.

[46]  L. Auslander,et al.  On parallelizable eigensolvers , 1992 .

[47]  Linda Kaufman,et al.  A Parallel QR Algorithm for the Symmetric Tridiagonal Eigenvalue Problem , 1994, J. Parallel Distributed Comput..

[48]  Jack J. Dongarra,et al.  A Parallel Algorithm for the Reduction of a Nonsymmetric Matrix to Block Upper-Hessenberg Form , 1995, Parallel Comput..

[49]  Robert A. van de Geijn Implementing the qr-algorithm on an array of processors , 1987 .

[50]  James Demmel,et al.  ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance , 1995, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[51]  Jack Dongarra,et al.  MPI: The Complete Reference , 1996 .

[52]  J. H. Wilkinson The algebraic eigenvalue problem , 1966 .

[53]  L. Kaufman,et al.  Squeezing the most out of eigenvalue solvers on high-performance computers , 1986 .