Condensed forms for the symmetric eigenvalue problem on multi‐threaded architectures

We investigate the performance of the routines in LAPACK and the Successive Band Reduction (SBR) toolbox for the reduction of a dense matrix to tridiagonal form, a crucial preprocessing stage in the solution of the symmetric eigenvalue problem, on general‐purpose multi‐core processors. In response to the advances of hardware accelerators, we also modify the code in the SBR toolbox to accelerate the computation by off‐loading a significant part of the operations to a graphics processor (GPU). The performance results illustrate the parallelism and scalability of these algorithms on current high‐performance multi‐core and many‐core architectures. Copyright © 2010 John Wiley & Sons, Ltd.

[1]  Gene H. Golub,et al.  Matrix computations , 1983 .

[2]  Christian H. Bischof,et al.  The WY representation for products of householder matrices , 1985, PPSC.

[3]  D. Sorensen,et al.  LAPACK Working Note No. 2: Block reduction of matrices to condensed forms for eigenvalue computations , 1987 .

[4]  D. Sorensen,et al.  Block reduction of matrices to condensed forms for eigenvalue computations , 1990 .

[5]  James Demmel,et al.  LAPACK Users' Guide, Third Edition , 1999, Software, Environments and Tools.

[6]  B. Lang Efficient eigenvalue and singular value computations on shared memory machines , 1999, Parallel Computing.

[7]  Christian H. Bischof,et al.  Algorithm 807: The SBR Toolbox—software for successive band reduction , 2000, TOMS.

[8]  R. Martin,et al.  Electronic Structure: Basic Theory and Practical Methods , 2004 .

[9]  B. Parlett,et al.  Multiple representations to compute orthogonal eigenvectors of symmetric tridiagonal matrices , 2004 .

[10]  Robert A. van de Geijn,et al.  A Parallel Eigensolver for Dense Symmetric Matrices Based on Multiple Relatively Robust Representations , 2005, SIAM J. Sci. Comput..

[11]  Rafael Mayo,et al.  Evaluation and tuning of the Level 3 CUBLAS for graphics processors , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[12]  Rafael Mayo,et al.  Solving Dense Linear Systems on Graphics Processors , 2008, Euro-Par.

[13]  James Demmel,et al.  Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[14]  Enrique S. Quintana-Ortí,et al.  Reduction to Condensed Forms for Symmetric Eigenvalue Problems on Multi-core Architectures , 2009, PPAM.

[15]  Robert A. van de Geijn,et al.  Level-3 BLAS on a GPU: Picking the low hanging fruit , 2012 .