A Novel Parallel QR Algorithm for Hybrid Distributed Memory HPC Systems

A novel variant of the parallel QR algorithm for solving dense nonsymmetric eigenvalue problems on hybrid distributed high performance computing systems is presented. For this purpose, we introduce the concept of multiwindow bulge chain chasing and parallelize aggressive early deflation. The multiwindow approach ensures that most computations when chasing chains of bulges are performed in level 3 BLAS operations, while the aim of aggressive early deflation is to speed up the convergence of the QR algorithm. Mixed MPI-OpenMP coding techniques are utilized for porting the codes to distributed memory platforms with multithreaded nodes, such as multicore processors. Numerous numerical experiments confirm the superior performance of our parallel QR algorithm in comparison with the existing ScaLAPACK code, leading to an implementation that is one to two orders of magnitude faster for sufficiently large problems, including a number of examples from applications.

[1]  Albert Cohen,et al.  Convergence Rates of Best N-term Galerkin Approximations for a Class of Elliptic sPDEs , 2010, Found. Comput. Math..

[2]  Bo Kågström,et al.  Algorithm 904 , 2010 .

[3]  Bo Kågström,et al.  Parallel Solvers for Sylvester-Type Matrix Equations with Applications in Condition Estimation, Part I , 2010, ACM Trans. Math. Softw..

[4]  R. C. Whaley,et al.  Empirically tuning LAPACK’s blocking factor for increased performance , 2008, 2008 International Multiconference on Computer Science and Information Technology.

[5]  Daniel Kressner,et al.  A parallel Schur method for solving continuous-time algebraic Riccati equations , 2008, 2008 IEEE International Conference on Computer-Aided Control Systems.

[6]  D. Kressner The Effect of Aggressive Early Deflation on the Convergence of the QR Algorithm , 2008, SIAM J. Matrix Anal. Appl..

[7]  David S. Watkins,et al.  The QR Algorithm Revisited , 2008, SIAM Rev..

[8]  David S. Watkins The matrix eigenvalue problem - GR and Krylov subspace methods , 2007 .

[9]  Yusaku Yamamoto,et al.  Performance Modeling and Optimal Block Size Selection for the Small-Bulge Multishift QR Algorithm , 2006, ISPA.

[10]  Daniel Kressner,et al.  Multishift Variants of the QZ Algorithm with Aggressive Early Deflation , 2006, SIAM J. Matrix Anal. Appl..

[11]  Daniel Kressner,et al.  Block algorithms for reordering standard and generalized Schur forms , 2006, TOMS.

[12]  Daniel Kressner,et al.  Parallel Variants of the Multishift QZ Algorithm with Advanced Deflation Techniques , 2006, PARA.

[13]  Robert A. van de Geijn,et al.  A Parallel Eigensolver for Dense Symmetric Matrices Based on Multiple Relatively Robust Representations , 2005, SIAM J. Sci. Comput..

[14]  Krister Dackland,et al.  Parallel and Blocked Algorithms for Reduction of a Regular Matrix Pair to Hessenberg-Triangular and Generalized Schur Forms , 2002, PARA.

[15]  Karen S. Braman,et al.  The Multishift QR Algorithm. Part II: Aggressive Early Deflation , 2001, SIAM J. Matrix Anal. Appl..

[16]  Karen S. Braman,et al.  The Multishift QR Algorithm. Part I: Maintaining Well-Focused Shifts and Level 3 Performance , 2001, SIAM J. Matrix Anal. Appl..

[17]  Enrique S. Quintana-Ortí,et al.  Solving algebraic Riccati equations on parallel computers using Newton's method with exact line search , 2000, Parallel Comput..

[18]  Krister Dackland,et al.  Blocked algorithms and software for reduction of a regular matrix pair to generalized Schur form , 1999, TOMS.

[19]  P. Benner,et al.  Solving linear and quadratic matrix equations on distributed memory parallel computers , 1999, Proceedings of the 1999 IEEE International Symposium on Computer Aided Control System Design (Cat. No.99TH8404).

[20]  Peter Benner,et al.  Solving stable generalized Lyapunov equations with the matrix sign function , 1999, Numerical Algorithms.

[21]  Bo Kågström,et al.  GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark , 1998, TOMS.

[22]  Bruno Lang,et al.  Using Level 3 BLAS in Rotation-Based Algorithms , 1998, SIAM J. Sci. Comput..

[23]  Jack Dongarra,et al.  A Test Matrix Collection for Non-Hermitian Eigenvalue Problems , 1997 .

[24]  L. Trefethen,et al.  Condition Numbers of Random Triangular Matrices , 1996, SIAM J. Matrix Anal. Appl..

[25]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[26]  Robert A. van de Geijn,et al.  Parallelizing the QR Algorithm for the Unsymmetric Algebraic Eigenvalue Problem: Myths and Reality , 1996, SIAM J. Sci. Comput..

[27]  David S. Watkins,et al.  The transmission of shifts and shift blurring in the QR algorithm , 1996 .

[28]  Jack J. Dongarra,et al.  A Parallel Algorithm for the Reduction of a Nonsymmetric Matrix to Block Upper-Hessenberg Form , 1995, Parallel Comput..

[29]  David S. Watkins,et al.  Forward Stability and Transmission of Shifts in the QR Algorithm , 1995, SIAM J. Matrix Anal. Appl..

[30]  Ed Anderson,et al.  LAPACK Users' Guide , 1995 .

[31]  Jaeyoung Choi,et al.  The design of a parallel dense linear algebra software library: Reduction to Hessenberg, tridiagonal, and bidiagonal form , 1995, Numerical Algorithms.

[32]  David S. Watkins,et al.  Shifting Strategies for the Parallel QR Algorithm , 1994, SIAM J. Sci. Comput..

[33]  Jack J. Dongarra,et al.  A set of level 3 basic linear algebra subprograms , 1990, TOMS.

[34]  Al Geist,et al.  Finding eigenvalues and eigenvectors of unsymmetric matrices using a distributed-memory multiprocessor , 1990, Parallel Comput..

[35]  Daniel Boley,et al.  A parallel QR algorithm for the nonsymmetric eigenvalue problem , 1989 .

[36]  James Demmel,et al.  On a Block Implementation of Hessenberg Multishift QR Iteration , 1989, Int. J. High Speed Comput..

[37]  G. A. Geist,et al.  Finding eigenvalues and eigenvectors of unsymmetric matrices using a hypercube multiprocessor , 1989, C3P.

[38]  Robert A. van de Geijn,et al.  Storage Schemes for Parallel Eigenvalue Algorithms , 1988 .

[39]  G. W. Stewart,et al.  A parallel implementation of the QR-algorithm , 1987, Parallel Comput..

[40]  Patricia J. Eberlein,et al.  On the Schur Decomposition of a Matrix for Parallel Computation , 1985, IEEE Transactions on Computers.

[41]  J. D. Roberts,et al.  Linear model reduction and solution of the algebraic Riccati equation by use of the sign function , 1980 .

[42]  Jack J. Dongarra,et al.  Scheduling two-sided transformations using tile algorithms on multicore architectures , 2010, Sci. Program..

[43]  Daniel Kressner,et al.  Parallel eigenvalue reordering in real Schur forms , 2009, Concurr. Comput. Pract. Exp..

[44]  Lars Karlsson,et al.  A framework for dynamic node-scheduling of two-sided blocked matrix computations , 2009 .

[45]  Ralph Byers,et al.  Lapack 3 . 1 xHSEQR : Tuning and Implementation Notes on the Small Bulge Multi-shift QR Algorithm with Aggressive Early Deflation , 2007 .

[46]  J. Demmel,et al.  Using GPUs to Accelerate the Bisection Algorithm for Finding Eigenvalues of Symmetric Tridiagonal Matrices , 2007 .

[47]  B. Kågström,et al.  The Multishift QZ Algorithm with Aggressive Early Deflation ? , 2006 .

[48]  D. S. Watkins A CASE WHERE BALANCING IS HARMFUL (cid:3) , 2005 .

[49]  Daniel Kressner,et al.  Numerical Methods for General and Structured Eigenvalue Problems , 2005, Lecture Notes in Computational Science and Engineering.

[50]  Christof Vömel,et al.  LAPACK WORKING NOTE 168 : PDSYEVR , 2005 .

[51]  Christof Vömel,et al.  LAPACK WORKING NOTE 168: PDSYEVR. SCALAPACK’S PARALLEL MRRR ALGORITHM FOR THE SYMMETRIC EIGENVALUE PROBLEM , 2005 .

[52]  R. Martin,et al.  Electronic Structure: Basic Theory and Practical Methods , 2004 .

[53]  Jack J. Dongarra,et al.  A Parallel Implementation of the Nonsymmetric QR Algorithm for Distributed Memory Architectures , 2002, SIAM J. Sci. Comput..

[54]  R. C. Whaley,et al.  Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..

[55]  Jack Dongarra,et al.  Templates for the Solution of Algebraic Eigenvalue Problems , 2000, Software, environments, tools.

[56]  Thomas Schreiber,et al.  A New Efficient Parallelization Strategy for the QR Algorithm , 1994, Parallel Comput..

[57]  Robert A. van de Geijn,et al.  Deferred Shifting Schemes for Parallel QR Methods , 1993, SIAM J. Matrix Anal. Appl..

[58]  Corporate The MPI Forum,et al.  MPI: a message passing interface , 1993, Supercomputing '93.

[59]  M. An accuracy and stability of numerical algorithms , 1991 .

[60]  R. Byers Solving the algebraic Riccati equation with the matrix sign function , 1987 .

[61]  Jack Dongarra,et al.  ScaLAPACK Users' Guide , 1987 .

[62]  G. Golub Matrix computations , 1983 .

[63]  A. Laub A schur method for solving algebraic Riccati equations , 1978, 1978 IEEE Conference on Decision and Control including the 17th Symposium on Adaptive Processes.

[64]  V. Kublanovskaya On some algorithms for the solution of the complete eigenvalue problem , 1962 .