Cache efficient bidiagonalization using BLAS 2.5 operators

On cache based computer architectures using current standard algorithms, Householder bidiagonalization requires a significant portion of the execution time for computing matrix singular values and vectors. In this paper we reorganize the sequence of operations for Householder bidiagonalization of a general m × n matrix, so that two (_GEMV) vector-matrix multiplications can be done with one pass of the unreduced trailing part of the matrix through cache. Two new BLAS operations approximately cut in half the transfer of data from main memory to cache, reducing execution times by up to 25 per cent. We give detailed algorithm descriptions and compare timings with the current LAPACK bidiagonalization algorithm.

[1]  Adolfy Hoisie,et al.  Performance Optimization of Numerically Intensive Codes , 1987 .

[2]  Ulrich Rüde,et al.  Portable Memory Hierarchy Techniques For PDE Solvers : Part I , 2000 .

[3]  Bruno Lang,et al.  Parallel Reduction of Banded Matrices to Bidiagonal Form , 1996, Parallel Comput..

[4]  K. Stanley,et al.  Execution time of symmetric eigensolvers , 1997 .

[5]  Inderjit S. Dhillon,et al.  A Way to Find the Most Redundant Equation in a Tridiagonal System , 1995 .

[6]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[7]  Gene H. Golub,et al.  Matrix computations , 1983 .

[8]  D. Sorensen,et al.  Block reduction of matrices to condensed forms for eigenvalue computations , 1990 .

[9]  C. Loan,et al.  A Storage-Efficient $WY$ Representation for Products of Householder Transformations , 1989 .

[10]  T DumaisSusan,et al.  Using linear algebra for intelligent information retrieval , 1995 .

[11]  I. Dhillon Algorithm for the Symmetric Tridiagonal Eigenvalue/Eigenvector Problem , 1998 .

[12]  Michael A. Saunders,et al.  LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares , 1982, TOMS.

[13]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[14]  B. AfeArd CALCULATING THE SINGULAR VALUES AND PSEUDOINVERSE OF A MATRIX , 2022 .

[15]  Jesse L. Barlow,et al.  Block and Parallel Versions of One-Sided Bidiagonalization , 2007, SIAM J. Matrix Anal. Appl..

[16]  Inderjit S. Dhillon,et al.  Fernando's solution to Wilkinson's problem: An application of double factorization , 1997 .

[17]  Rui Ralha,et al.  One-sided reduction to bidiagonal form , 2003 .

[18]  Bruno Lang,et al.  Efficient parallel reduction to bidiagonal form , 1999, Parallel Comput..

[19]  Jack J. Dongarra,et al.  Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[20]  J. Demmel,et al.  Sun Microsystems , 1996 .

[21]  Z. Drmač,et al.  A new stable bidiagonal reduction algorithm , 2005 .

[22]  Jaeyoung Choi,et al.  The design of a parallel dense linear algebra software library: Reduction to Hessenberg, tridiagonal, and bidiagonal form , 1995, Numerical Algorithms.

[23]  Gene H. Golub,et al.  Singular value decomposition and least squares solutions , 1970, Milestones in Matrix Computation.

[24]  Michael W. Berry,et al.  Large-Scale Sparse Singular Value Computations , 1992 .

[25]  Michael W. Berry,et al.  SVDPACKC (Version 1.0) User''s Guide , 1993 .

[26]  Christian H. Bischof,et al.  The WY representation for products of householder matrices , 1985, PPSC.

[27]  Jack J. Dongarra,et al.  Basic Linear Algebra Subprograms Technical (Blast) Forum Standard (1) , 2002, Int. J. High Perform. Comput. Appl..

[28]  Jack Dongarra,et al.  Numerical Linear Algebra for High-Performance Computers , 1998 .

[29]  Gene H. Golub,et al.  Calculating the singular values and pseudo-inverse of a matrix , 2007, Milestones in Matrix Computation.

[30]  Jack Dongarra,et al.  Preface: Basic Linear Algebra Subprograms Technical (Blast) Forum Standard , 2002 .