High performance parallel approximate eigensolver for real symmetric matrices

In the first-principles calculation of electronic structures, one of the most timeconsuming tasks is that of computing the eigensystem of a large symmetric nonlinear eigenvalue problem. The standard approach is to use an iterative scheme involving the solution to a large symmetric linear eigenvalue problem in each iteration. In the early and intermediate iterations, significant gains in efficiency may result from solving the eigensystem to reduced accuracy. As the iteration nears convergence, the eigensystem can be computed to the required accuracy. The main contribution of this dissertation is an efficient parallel approximate eigensolver that computes eigenpairs of a real symmetric matrix to reduced accuracy. This eigensolver consists of three major parts: (1) a parallel block divide-and-conquer algorithm that computes the approximate eigenpairs of a block tridiagonal matrix to prescribed accuracy; (2) a parallel block tridiagonalization algorithm that constructs a block tridiagonal matrix from a sparse matrix or "effectively" sparse matrix---matrix with many small elements that can be regarded as zeros without affecting the prescribed accuracy of the eigenvalues; (3) a parallel orthogonal block tridiagonal reduction algorithm that reduces a dense real symmetric matrix to block tridiagonal form using similarity transformations with a high ratio of level 3 BLAS operations. The parallel approximate eigensolver chooses a proper combination of the three algorithms depending on the structure of the input matrix and computes all the eigenpairs of the input matrix to prescribed accuracy. Numerical results show that the parallel approximate eigensolver is efficient and accurate to the prescribed tolerance. The time required for computing the approximate eigenpairs decreases significantly as the accuracy tolerance becomes larger.

[1]  David J. Kuck,et al.  A Parallel QR Algorithm for Symmetric Tridiagonal Matrices , 1977, IEEE Transactions on Computers.

[2]  Robert A. van de Geijn,et al.  Reduction to condensed form for the eigenvalue problem on distributed memory architectures , 1992, Parallel Comput..

[3]  J. Bunch,et al.  Rank-one modification of the symmetric eigenproblem , 1978 .

[4]  Vipin Kumar,et al.  A Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering , 1998, J. Parallel Distributed Comput..

[5]  Inderjit S. Dhillon,et al.  Orthogonal Eigenvectors and Relative Gaps , 2003, SIAM J. Matrix Anal. Appl..

[6]  John G. Lewis Algorithm 582: The Gibbs-Poole-Stockmeyer and Gibbs-King Algorithms for Reordering Sparse Matrices , 1982, TOMS.

[7]  Michael Oettli A Robust, Parallel Homotopy Algorithm for the Symmetric Tridiagonal Eigenproblem , 1998, SIAM J. Sci. Comput..

[8]  C. Loan,et al.  A Storage-Efficient $WY$ Representation for Products of Householder Transformations , 1989 .

[9]  Inderjit S. Dhillon,et al.  The design and implementation of the MRRR algorithm , 2006, TOMS.

[10]  C. H. Bischof,et al.  A framework for symmetric band reduction and tridiagonalization , 1994 .

[11]  J. Pople,et al.  Approximate Self‐Consistent Molecular Orbital Theory. II. Calculations with Complete Neglect of Differential Overlap , 1965 .

[12]  D. Sorensen,et al.  LAPACK Working Note No. 2: Block reduction of matrices to condensed forms for eigenvalue computations , 1987 .

[13]  David L. Beveridge,et al.  Approximate molecular orbital theory , 1970 .

[14]  Wilfried N. Gansterer,et al.  An extension of the divide-and-conquer method for a class of symmetric block-tridiagonal eigenproblems , 2002, TOMS.

[15]  Christian H. Bischof,et al.  Algorithm 807: The SBR Toolbox—software for successive band reduction , 2000, TOMS.

[16]  Jeffery D. Rutter LAPACK Working Note 69: A Serial Implementation of Cuppen''s Divide and Conquer Algorithm for the Symmetric Eigenvalue Problem , 1994 .

[17]  J. H. Wilkinson,et al.  The Calculation of Specified Eigenvectors by Inverse Iteration , 1971 .

[18]  Patrick H. Worley,et al.  Early Evaluation of the IBM p690 , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[19]  J. H. Wilkinson Global convergene of tridiagonal QR algorithm with origin shifts , 1968 .

[20]  M. Dewar,et al.  Ground States of Molecules. 38. The MNDO Method. Approximations and Parameters , 1977 .

[21]  W. Kohn,et al.  Self-Consistent Equations Including Exchange and Correlation Effects , 1965 .

[22]  Alan J. Heeger,et al.  Soliton excitations in polyacetylene , 1980 .

[23]  H. Rutishauser Der Quotienten-Differenzen-Algorithmus , 1954 .

[24]  Norman E. Gibbs,et al.  A Comparison of Several Bandwidth and Profile Reduction Algorithms , 1976, TOMS.

[25]  Tien-Yien Li,et al.  Homotopy algorithm for symmetric eigenvalue problems , 1989 .

[26]  R. Ward,et al.  Performance of Parallel Eigensolvers on Electronic Structure Calculations II , 2005 .

[27]  Herbert J. Bernstein,et al.  Parallel implementation of bisection for the calculation of eigenvalues of tridiagonal symmetric matrices , 2005, Computing.

[28]  Robert A. van de Geijn,et al.  SUMMA: Scalable Universal Matrix Multiplication Algorithm , 1995 .

[29]  Peter Arbenz,et al.  Parallel divide and conquer algorithms for the symmetric tridiagonal eigenproblem , 1994 .

[30]  Burke,et al.  Generalized Gradient Approximation Made Simple. , 1996, Physical review letters.

[31]  George Ho,et al.  PAPI: A Portable Interface to Hardware Performance Counters , 1999 .

[32]  D. Sorensen,et al.  On the orthogonality of eigenvectors computed by divide-and-conquer techniques , 1991 .

[33]  Claude Brezinski,et al.  Lectures on Numerical mathematics , 1991, Numerical Algorithms.

[34]  Wilfried N. Gansterer,et al.  Block tridiagonalization of "effectively" sparse symmetric matrices , 2004, TOMS.

[35]  W. Givens Numerical Computation of the Characteristic Values of a Real Symmetric Matrix , 1954 .

[36]  A. Melman A numerical comparison of methods for solving secular equations , 1997 .

[37]  Jack Dongarra,et al.  ScaLAPACK: a scalable linear algebra library for distributed memory concurrent computers , 1992, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation.

[38]  Beresford N. Parlett,et al.  Invariant subspaces for tightly clustered eigenvalues of tridiagonals , 1996 .

[39]  Linda Kaufman,et al.  A Parallel QR Algorithm for the Symmetric Tridiagonal Eigenvalue Problem , 1994, J. Parallel Distributed Comput..

[40]  James Demmel,et al.  Applied Numerical Linear Algebra , 1997 .

[41]  B. Parlett,et al.  Accurate singular values and differential qd algorithms , 1994 .

[42]  T. Pruschke,et al.  Introduction to Solid-State Theory , 1996 .

[43]  Paul K. Weiner,et al.  Ground states of molecules , 1972 .

[44]  Jaeyoung Choi,et al.  The design of a parallel dense linear algebra software library: Reduction to Hessenberg, tridiagonal, and bidiagonal form , 1995, Numerical Algorithms.

[45]  Shirley Dex,et al.  JR 旅客販売総合システム(マルス)における運用及び管理について , 1991 .

[46]  Christian H. Bischof,et al.  A framework for symmetric band reduction , 2000, TOMS.

[47]  I. Dhillon Algorithm for the Symmetric Tridiagonal Eigenvalue/Eigenvector Problem , 1998 .

[48]  Tien-Yien Li,et al.  An Algorithm for Symmetric Tridiagonal Eigenproblems: Divide and Conquer with Homotopy Continuation , 1993, SIAM J. Sci. Comput..

[49]  M. Chu A simple application of the homotopy method to symmetric eigenvalue problems , 1984 .

[50]  Gene H. Golub,et al.  Some modified matrix eigenvalue problems , 1973, Milestones in Matrix Computation.

[51]  B. Parlett The Symmetric Eigenvalue Problem , 1981 .

[52]  J. Pople,et al.  Approximate Self‐Consistent Molecular Orbital Theory. III. CNDO Results for AB2 and AB3 Systems , 1966 .

[53]  Robert A. van de Geijn,et al.  Deferred Shifting Schemes for Parallel QR Methods , 1993, SIAM J. Matrix Anal. Appl..

[54]  Ilse C. F. Ipsen Computing an Eigenvector with Inverse Iteration , 1997, SIAM Rev..

[55]  Christian H. Bischof,et al.  The WY representation for products of householder matrices , 1985, PPSC.

[56]  Peter Arbenz,et al.  A parallel implementation of the symmetric tridiagonal QR algorithm , 1992, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation.

[57]  B. Parlett,et al.  Multiple representations to compute orthogonal eigenvectors of symmetric tridiagonal matrices , 2004 .

[58]  Steven Huss-Lederman,et al.  A Parallelizable Eigensolver for Real Diagonalizable Matrices with Real Eigenvalues , 1997, SIAM J. Sci. Comput..

[59]  Jack J. Dongarra,et al.  A fully parallel algorithm for the symmetric eigenvalue problem , 1985, PPSC.

[60]  Jack J. Dongarra,et al.  A Portable Programming Interface for Performance Evaluation on Modern Processors , 2000, Int. J. High Perform. Comput. Appl..

[61]  Stanley C. Eisenstat,et al.  A Divide-and-Conquer Algorithm for the Symmetric Tridiagonal Eigenproblem , 1995, SIAM J. Matrix Anal. Appl..

[62]  Wilfried N. Gansterer,et al.  Computing Approximate Eigenpairs of Symmetric Block Tridiagonal Matrices , 2003, SIAM J. Sci. Comput..

[63]  Robert A. van de Geijn,et al.  A Parallel Eigensolver for Dense Symmetric Matrices Based on Multiple Relatively Robust Representations , 2005, SIAM J. Sci. Comput..

[64]  D. Sorensen,et al.  Block reduction of matrices to condensed forms for eigenvalue computations , 1990 .

[65]  J. Cuppen A divide and conquer method for the symmetric tridiagonal eigenproblem , 1980 .

[66]  S. Eisenstat,et al.  A Stable and Efficient Algorithm for the Rank-One Modification of the Symmetric Eigenproblem , 1994, SIAM J. Matrix Anal. Appl..

[67]  Beresford N. Parlett Spectral sensitivity of products of bidiagonals , 1998 .

[68]  N. S. Ostlund,et al.  Approximate self-consistent molecular orbital theory of nuclear spin coupling. V. Proton-proton coupling constants in substituted benzenes , 1970 .

[69]  Jack Dongarra,et al.  ScaLAPACK user's guide , 1997 .

[70]  J. H. Wilkinson The algebraic eigenvalue problem , 1966 .

[71]  William G. Poole,et al.  An algorithm for reducing the bandwidth and profile of a sparse matrix , 1976 .

[72]  Y. Danieli Guide , 2005 .

[73]  R. Parr Density-functional theory of atoms and molecules , 1989 .

[74]  Xian-He Sun,et al.  Parallel Homotopy Algorithm for the Symmetric Tridiagonal Eigenvalue Problem , 1991, SIAM J. Sci. Comput..

[75]  Robert M. Day A Coarse-Grain Parallel Implementation of the Block Tridiagonal Divide and Conquer Algorithm for Symmetric Eigenproblems. , 2003 .

[76]  N. H. March,et al.  The many-body problem in quantum mechanics , 1968 .

[77]  Michael Oettli,et al.  The homotopy method applied to the symmetric eigenproblem , 1995 .

[78]  Michael T. Heath,et al.  Parallel Algorithms for Sparse Linear Systems , 1991, SIAM Rev..

[79]  C. H. Bischof,et al.  A parallel implementation of symmetric band reduction using PLAPACK , 1996 .

[80]  Ming Gu,et al.  Studies in numerical linear algebra , 1993 .

[81]  Wilfried N. Gansterer,et al.  Multi-sweep Algorithms for the Symmetric Eigenproblem , 1998, VECPAR.

[82]  Jack J. Dongarra,et al.  A set of level 3 basic linear algebra subprograms , 1990, TOMS.

[83]  Ilse C. F. Ipsen,et al.  Improving the Accuracy of Inverse Iteration , 1992, SIAM J. Sci. Comput..

[84]  Guodong Zhang,et al.  A Parallel Implementation of the Invariant Subspace Decomposition Algorithm for Dense Symmetric Matrices , 1993, PPSC.

[85]  Christian H. Bischof,et al.  Parallel Bandreduction and Tridiagonalization , 1993, PPSC.

[86]  Robert A. van de Geijn,et al.  SUMMA: scalable universal matrix multiplication algorithm , 1995, Concurr. Pract. Exp..

[87]  Ilse C. F. Ipsen,et al.  Solving the Symmetric Tridiagonal Eigenvalue Problem on the Hypercube , 1990, SIAM J. Sci. Comput..

[88]  Norman E. Gibbs,et al.  Matrix Bandwidth and Profile Reduction. , 1975 .

[89]  Jaeyoung Choi,et al.  A Proposal for a Set of Parallel Basic Linear Algebra Subprograms , 1995, PARA.

[90]  J. G. F. Francis,et al.  The QR Transformation A Unitary Analogue to the LR Transformation - Part 1 , 1961, Comput. J..

[91]  James Demmel,et al.  Modeling the benefits of mixed data and task parallelism , 1995, SPAA '95.

[92]  J. Callaway Quantum theory of the solid state , 1974 .

[93]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[94]  Jack Dongarra,et al.  Parallelizing the Divide and Conquer Algorithm for the SymmetricTridiagonal Eigenvalue Problem on Distributed Memory Architectures , 1998 .