Bisection and twisted SVD on GPU

Singular value decomposition (SVD) is one of the most important factorizations in matrix computation. However, computing SVD is still time-consuming, especially when the dimension of matrices exceeds tens of thousands. In this paper, we present a high performance approach called “Bisection and Twisted” (BT) for solving bidiagonal SVD. As modern general purpose GPUs have shown their extreme computational advantages in parallel computing, we implement the BT algorithm on single and multiple GPUs. With our carefully designed GPU kernels, the BT algorithm is about 10 times faster than MKL divide-and-conquer routine DBDSDC on an 8-core 2.53GHz CPU, and 36 times faster than CULA QR routine DBDSQR on the same GPUs. Additionally, the BT algorithm is able to compute SVD for matrices of size 1 million by 1 million with only two GPUs. To the best of our knowledge, no implementation has achieved such a scale.

[1]  James Demmel,et al.  Applied Numerical Linear Algebra , 1997 .

[2]  A. Kostić,et al.  ON SYLVESTER’S LAW OF INERTIA FOR NONLINEAR EIGENVALUE PROBLEMS , 2012 .

[3]  Bruno Lang,et al.  Computing the Bidiagonal SVD Using Multiple Relatively Robust Representations , 2006, SIAM J. Matrix Anal. Appl..

[4]  Vedran Novakovic,et al.  A Hierarchically Blocked Jacobi SVD Algorithm for Single and Multiple Graphics Processing Units , 2014, SIAM Journal on Scientific Computing.

[5]  Ruixuan Li,et al.  A divide-and-conquer approach for solving singular value decomposition on a heterogeneous system , 2013, CF '13.

[6]  Gene H. Golub,et al.  Calculating the singular values and pseudo-inverse of a matrix , 2007, Milestones in Matrix Computation.

[7]  Alan M. Frieze,et al.  Clustering in large graphs and matrices , 1999, SODA '99.

[8]  Volodymyr Kindratenko,et al.  Numerical Computations with GPUs , 2014, Springer International Publishing.

[9]  J. Demmel,et al.  LAPACK Working Note 88: Efficient Computation of the Singular Value Decomposition with Applications to Least Squares Problems , 1994 .

[10]  J. Demmel,et al.  On the correctness of some bisection-like parallel eigenvalue algorithms in floating point arithmetic. , 1995 .

[11]  Frank J. Seinstra,et al.  GPU-based parallel householder bidiagonalization , 2010, HPDC '10.

[12]  Rui Wang,et al.  A GPU-Based Approximate SVD Algorithm , 2011, PPAM.

[13]  B. AfeArd CALCULATING THE SINGULAR VALUES AND PSEUDOINVERSE OF A MATRIX , 2022 .

[14]  H. E. Bell,et al.  Gershgorin's Theorem and the Zeros of Polynomials , 1965 .

[15]  Jack Dongarra,et al.  Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects , 2009 .

[16]  Eric J. Kelmelis,et al.  CULA: hybrid GPU accelerated linear algebra routines , 2010, Defense + Commercial Sensing.

[17]  P. J. Narayanan,et al.  Singular value decomposition on GPU using CUDA , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[18]  Wei Xu,et al.  A twisted factorization method for symmetric SVD of a complex symmetric tridiagonal matrix , 2009, Numer. Linear Algebra Appl..