论文信息 - The Performance of Finding Eigenvalues and Eigenvaectors of Dense Symmetric Matrices on Distributed Memory Computers

The Performance of Finding Eigenvalues and Eigenvaectors of Dense Symmetric Matrices on Distributed Memory Computers

We discuss timing and performance modeling of a routine to find all the eigenvalues and eigenvectors of a dense symmetric matrix on distributed memory computers. The routine, PDSYEVX, is part of the ScaLAPACK library. It is based on bisection and inverse iteration, but is not designed to guarantee orthogonality of eigenvectors in the presence of clustered eigenvalues. We use our validated performance model to conclude that PDSYEVX is very efficient for large enough problem sizes, nearly independently of input and output data layouts. However, efficiency will be low if interprocessor communication is too slow, such as on conventional workstation networks, or if per processor memory is too small, such as on the Intel Gamma. Modeling also helps us choose the appropriate algorithm to deal with clusters.

James Demmel | Ken Stanley

[1] Jack J. Dongarra,et al. A set of level 3 basic linear algebra subprograms , 1990, TOMS.

[2] B. Parlett. The Symmetric Eigenvalue Problem , 1981 .

[3] Jack Dongarra,et al. A User''s Guide to PVM Parallel Virtual Machine , 1991 .

[4] Jack Dongarra,et al. ScaLAPACK: a scalable linear algebra library for distributed memory concurrent computers , 1992, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation.

[5] Jack Dongarra,et al. PB-BLAS: a set of parallel block basic linear algebra subprograms , 1996 .

[6] Jeffery D. Rutter. A Serial Implementation of Cuppen''s Divide and Conquer Algorithm , 1991 .

[7] R. van de Geijn,et al. A look at scalable dense linear algebra libraries , 1992, Proceedings Scalable High Performance Computing Conference SHPCC-92..

[8] F. Desprez,et al. Performance Complexity of Lu Factorization with Eecient Pipelining and Overlap on a Multiprocessor Performance Complexity of Lu Factorization with Eecient Pipelining and Overlap on a Multiprocessor , 2007 .

[9] Richard M. Karp,et al. Optimal broadcast and summation in the LogP model , 1993, SPAA '93.

[10] James Demmel,et al. Parallel numerical linear algebra , 1993, Acta Numerica.

[11] R. C. Whaley,et al. LAPACK Working Note 73: Basic Linear Algebra Communication Subprograms: Analysis and Implementation Across Multiple Parallel Architectures , 1994 .

[12] Xiaobai Sun,et al. Parallel performance of a symmetric eigensolver based on the invariant subspace decomposition approach , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.