Parallel two-stage reduction to Hessenberg form using dynamic scheduling on shared-memory architectures

We consider parallel reduction of a real matrix to Hessenberg form using orthogonal transformations. Standard Hessenberg reduction algorithms reduce the columns of the matrix from left to right in either a blocked or unblocked fashion. However, the standard blocked variant performs 20% of the computations in terms of matrix-vector multiplications. We show that a two-stage approach consisting of an intermediate reduction to block Hessenberg form speeds up the reduction by avoiding matrix-vector multiplications. We describe and evaluate a new high-performance implementation of the two-stage approach that attains significant speedups over the one-stage approach. The key components are a dynamically scheduled implementation of Stage 1 and a blocked, adaptively load-balanced implementation of Stage 2.

[1]  Li Tiancheng,et al.  アルゴリズム906: elrint3d―組み込み格子ルールのシーケンスを用いる三次元非適応自動立体求積法ルーチン , 2011 .

[2]  Sraban Kumar Mohanty I/O Efficient Algorithms for Matrix Computations , 2010, ArXiv.

[3]  Gene H. Golub,et al.  Matrix computations , 1983 .

[4]  Lars Karlsson,et al.  Efficient Reduction from Block Hessenberg Form to Hessenberg Form Using Shared Memory , 2010, PARA.

[5]  Jack Dongarra,et al.  Scheduling two-sided transformations using tile algorithms on multicore architectures , 2010 .

[6]  Karen S. Braman,et al.  The Multishift QR Algorithm. Part I: Maintaining Well-Focused Shifts and Level 3 Performance , 2001, SIAM J. Matrix Anal. Appl..

[7]  Erik Elmroth,et al.  Applying recursion to serial and parallel QR factorization leads to better performance , 2000, IBM J. Res. Dev..

[8]  Jaeyoung Choi,et al.  The design of a parallel dense linear algebra software library: Reduction to Hessenberg, tridiagonal, and bidiagonal form , 1995, Numerical Algorithms.

[9]  Krister Dackland,et al.  Blocked algorithms and software for reduction of a regular matrix pair to generalized Schur form , 1999, TOMS.

[10]  Jack J. Dongarra,et al.  A Parallel Algorithm for the Reduction of a Nonsymmetric Matrix to Block Upper-Hessenberg Form , 1995, Parallel Comput..

[11]  Bo Kågström,et al.  Parallel Solvers for Sylvester-Type Matrix Equations with Applications in Condition Estimation, Part I , 2010, ACM Trans. Math. Softw..

[12]  Charles Van Loan Using the Hessenberg decomposition in control theory , 1982 .

[13]  Robert A. van de Geijn,et al.  Improving the performance of reduction to Hessenberg form , 2006, TOMS.

[14]  Xiaobai Sun,et al.  Parallel tridiagonalization through two-step band reduction , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.

[15]  Bo Kågström,et al.  Algorithm 904 , 2010 .

[16]  D. Sorensen,et al.  LAPACK Working Note No. 2: Block reduction of matrices to condensed forms for eigenvalue computations , 1987 .

[17]  Bo Kågström,et al.  Parallel Solvers for Sylvester-Type Matrix Equations with Applications in Condition Estimation, Part I , 2010, ACM Trans. Math. Softw..

[18]  Enrique S. Quintana-Ortí,et al.  Reduction to Condensed Forms for Symmetric Eigenvalue Problems on Multi-core Architectures , 2009, PPAM.

[19]  C. Loan,et al.  A Storage-Efficient $WY$ Representation for Products of Householder Transformations , 1989 .

[20]  B. Kågström,et al.  Blocked algorithms for the reduction to Hessenberg-triangular form revisited , 2008 .

[21]  H. Schwarz Tridiagonalization of a symetric band matrix , 1968 .

[22]  K. Murata,et al.  A New Method for the Tridiagonalization of the Symmetric Band Matrix , 1975 .

[23]  Erik Elmroth,et al.  SIAM REVIEW c ○ 2004 Society for Industrial and Applied Mathematics Vol. 46, No. 1, pp. 3–45 Recursive Blocked Algorithms and Hybrid Data Structures for Dense Matrix Library Software ∗ , 2022 .

[24]  Jack J. Dongarra,et al.  Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing , 2010, Parallel Comput..

[25]  Christian H. Bischof,et al.  A framework for symmetric band reduction , 2000, TOMS.

[26]  Jack Dongarra,et al.  Accelerating the reduction to upper Hessenberg form through hybrid GPU-based computing , 2009 .

[27]  Daniel Kressner,et al.  A Novel Parallel QR Algorithm for Hybrid Distributed Memory HPC Systems , 2010, SIAM J. Sci. Comput..