A Parallel Structured Divide-and-Conquer Algorithm for Symmetric Tridiagonal Eigenvalue Problems

In this article, a parallel structured divide-and-conquer (PSDC) eigensolver is proposed for symmetric tridiagonal matrices based on ScaLAPACK and a parallel structured matrix multiplication algorithm, called PSMMA. Computing the eigenvectors via matrix-matrix multiplications is the most computationally expensive part of the divide-and-conquer algorithm, and one of the matrices involved in such multiplications is a rank-structured Cauchy-like matrix. By exploiting this particular property, PSMMA constructs the local matrices by using generators of Cauchy-like matrices without any communication, and further reduces the computation costs by using a structured low-rank approximation algorithm. Thus, both the communication and computation costs are reduced. Experimental results show that both PSMMA and PSDC are highly scalable and scale to 4096 processes at least. PSDC has better scalability than PHDC that was proposed in [16] and only scaled to 300 processes for the same matrices. Comparing with PDSTEDC in ScaLAPACK, PSDC is always faster and achieves 1.4x–1.6x speedup for some matrices with few deflations. PSDC is also comparable with ELPA, with PSDC being faster than ELPA when using few processes and a little slower when using many processes.

[1]  Paolo Bientinesi,et al.  High-Performance Solvers for Dense Hermitian Eigenproblems , 2012, SIAM J. Sci. Comput..

[2]  James Demmel,et al.  Algorithm 880: A testing infrastructure for symmetric tridiagonal eigensolvers , 2008, TOMS.

[3]  Ming Gu,et al.  Stable and Efficient Algorithms for Structured Systems of Linear Equations , 1998, SIAM J. Matrix Anal. Appl..

[4]  Wolfgang Hackbusch,et al.  A Sparse Matrix Arithmetic Based on H-Matrices. Part I: Introduction to H-Matrices , 1999, Computing.

[5]  Chao Yang,et al.  623 Tflop/s HPCG run on Tianhe-2: Leveraging millions of hybrid cores , 2016, Int. J. High Perform. Comput. Appl..

[6]  Robert A. van de Geijn,et al.  SUMMA: scalable universal matrix multiplication algorithm , 1995, Concurr. Pract. Exp..

[7]  Xiangke Liao,et al.  An improved divide-and-conquer algorithm for the banded matrices with narrow bandwidths , 2016, Comput. Math. Appl..

[8]  Jean-Yves L'Excellent,et al.  Improving Multifrontal Methods by Means of Block Low-Rank Representations , 2015, SIAM J. Sci. Comput..

[9]  Jie Liu,et al.  New fast divide‐and‐conquer algorithms for the symmetric tridiagonal eigenvalue problem , 2015, Numer. Linear Algebra Appl..

[10]  Torsten Hoefler,et al.  Red-blue pebbling revisited: near optimal parallel matrix-matrix multiplication , 2019, SC.

[11]  Jaeyoung Choi A new parallel matrix multiplication algorithm on distributed-memory concurrent computers , 1998, Concurr. Pract. Exp..

[12]  James Demmel,et al.  Minimizing Communication in Numerical Linear Algebra , 2009, SIAM J. Matrix Anal. Appl..

[13]  Jack Dongarra,et al.  Distributed-memory lattice H -matrix factorization , 2019, Int. J. High Perform. Comput. Appl..

[14]  James Demmel,et al.  ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance , 1995, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[15]  James Demmel,et al.  Communication-Optimal Parallel 2.5D Matrix Multiplication and LU Factorization Algorithms , 2011, Euro-Par.

[16]  Xuebin Chi,et al.  An Accelerated Divide-and-Conquer Algorithm for the Bidiagonal SVD Problem , 2014, SIAM J. Matrix Anal. Appl..

[17]  J. Bunch,et al.  Rank-one modification of the symmetric eigenproblem , 1978 .

[18]  Thomas Kailath,et al.  Fast Gaussian elimination with partial pivoting for matrices with displacement structure , 1995 .

[19]  Jie Liu,et al.  An efficient hybrid tridiagonal divide-and-conquer algorithm on distributed memory architectures , 2016, J. Comput. Appl. Math..

[20]  Stanley C. Eisenstat,et al.  A Divide-and-Conquer Algorithm for the Symmetric Tridiagonal Eigenproblem , 1995, SIAM J. Matrix Anal. Appl..

[21]  Mark Tygert,et al.  Fast algorithms for spherical harmonic expansions, II , 2008, J. Comput. Phys..

[22]  Geoffrey C. Fox,et al.  Matrix algorithms on a hypercube I: Matrix multiplication , 1987, Parallel Comput..

[23]  Ming Gu,et al.  Studies in numerical linear algebra , 1993 .

[24]  Lynn Elliot Cannon,et al.  A cellular computer to implement the kalman filter algorithm , 1969 .

[25]  A Marek,et al.  The ELPA library: scalable parallel eigenvalue solutions for electronic structure theory and computational science , 2014, Journal of physics. Condensed matter : an Institute of Physics journal.

[26]  V. Pan,et al.  Polynomial and matrix computations (vol. 1): fundamental algorithms , 1994 .

[27]  Shivkumar Chandrasekaran,et al.  A Fast Solver for HSS Representations via Sparse Matrices , 2006, SIAM J. Matrix Anal. Appl..

[28]  S. Eisenstat,et al.  A Stable and Efficient Algorithm for the Rank-One Modification of the Symmetric Eigenproblem , 1994, SIAM J. Matrix Anal. Appl..

[29]  Jack J. Dongarra,et al.  A Parallel Divide and Conquer Algorithm for the Symmetric Eigenvalue Problem on Distributed Memory Architectures , 1999, SIAM J. Sci. Comput..

[30]  James Demmel,et al.  Communication-optimal Parallel and Sequential QR and LU Factorizations , 2008, SIAM J. Sci. Comput..

[31]  V. Pan Structured Matrices and Polynomials: Unified Superfast Algorithms , 2001 .

[32]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[33]  Pieter Ghysels,et al.  A Distributed-Memory Package for Dense Hierarchically Semi-Separable Matrix Computations Using Randomization , 2015, ACM Trans. Math. Softw..

[34]  James Demmel,et al.  Cyclops Tensor Framework: Reducing Communication and Eliminating Load Imbalance in Massively Parallel Contractions , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[35]  Jaeyoung Choi,et al.  Pumma: Parallel universal matrix multiplication algorithms on distributed memory concurrent computers , 1994, Concurr. Pract. Exp..

[36]  Lukas Krämer,et al.  Parallel solution of partial symmetric eigenvalue problems from electronic structure calculations , 2011, Parallel Comput..

[37]  B. Parlett,et al.  Multiple representations to compute orthogonal eigenvectors of symmetric tridiagonal matrices , 2004 .

[38]  James Demmel,et al.  Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[39]  J. Cuppen A divide and conquer method for the symmetric tridiagonal eigenproblem , 1980 .

[40]  Peter Arbenz,et al.  Divide and conquer algorithms for the bandsymmetric eigenvalue problem , 1992, Parallel Computing.

[41]  C. Pan On the existence and computation of rank-revealing LU factorizations , 2000 .

[42]  T. Pals,et al.  FAST MATRIX ALGORITHMS FOR HIERARCHICALLY SEMI-SEPARABLE REPRESENTATIONS , 2002 .

[43]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[44]  Canqun Yang,et al.  MilkyWay-2 supercomputer: system and application , 2014, Frontiers of Computer Science.

[45]  W. Hackbusch A Sparse Matrix Arithmetic Based on $\Cal H$-Matrices. Part I: Introduction to ${\Cal H}$-Matrices , 1999, Computing.