Efficient GPU implementation of randomized SVD and its applications

Matrix decompositions are ubiquitous in machine learning, including applications in dimensionality reduction, data compression and deep learning algorithms. Typical solutions for matrix decompositions have polynomial complexity which significantly increases their computational cost and time. In this work, we leverage efficient processing operations that can be run in parallel on modern Graphical Processing Units (GPUs), predominant computing architecture used e.g. in deep learning, to reduce the computational burden of computing matrix decompositions. More specifically, we reformulate the randomized decomposition problem to incorporate fast matrix multiplication operations (BLAS-3) as building blocks. We show that this formulation, combined with fast random number generators, allows to fully exploit the potential of parallel processing implemented in GPUs. Our extensive evaluation confirms the superiority of this approach over the competing methods and we release the results of this research as a part of the official CUDA implementation1. Keyword: Matrix decompositions, randomized SVD, eigenvalues, CUDA, GPU

[1]  Gilles Mourot,et al.  An improved PCA scheme for sensor FDI: Application to an air quality monitoring network , 2006 .

[2]  Ping Zhang,et al.  On the application of PCA technique to fault diagnosis , 2010 .

[3]  Tamás Sarlós,et al.  Improved Approximation Algorithms for Large Matrices via Random Projections , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[4]  Luis Mateus Rocha,et al.  Singular value decomposition and principal component analysis , 2003 .

[5]  Yaohang Li,et al.  Faster Matrix Completion Using Randomized SVD , 2018, 2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI).

[6]  Chao Yang,et al.  ARPACK users' guide - solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods , 1998, Software, environments, tools.

[7]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[8]  R. Mises,et al.  Praktische Verfahren der Gleichungsauflösung . , 1929 .

[9]  C. Chui,et al.  Article in Press Applied and Computational Harmonic Analysis a Randomized Algorithm for the Decomposition of Matrices , 2022 .

[10]  Amir Averbuch,et al.  Randomized LU Decomposition , 2013, ArXiv.

[11]  Heiko Hoffmann,et al.  Kernel PCA for novelty detection , 2007, Pattern Recognit..

[12]  V. Rokhlin,et al.  A fast randomized algorithm for the approximation of matrices ✩ , 2007 .

[13]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[14]  Jed A. Duersch,et al.  Randomized QR with Column Pivoting , 2015, SIAM J. Sci. Comput..

[15]  Lothar Reichel,et al.  Augmented Implicitly Restarted Lanczos Bidiagonalization Methods , 2005, SIAM J. Sci. Comput..

[16]  K. R. Rao,et al.  The Transform and Data Compression Handbook , 2000 .

[17]  Charles L. Lawson,et al.  Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.

[18]  D. Calvetti,et al.  AN IMPLICITLY RESTARTED LANCZOS METHOD FOR LARGE SYMMETRIC EIGENVALUE PROBLEMS , 1994 .

[19]  Yixuan Qiu,et al.  Solvers for Large-Scale Eigenvalue and SVD Problems [R package RSpectra version 0.16-0] , 2019 .

[20]  R. Larsen Lanczos Bidiagonalization With Partial Reorthogonalization , 1998 .

[21]  Alan M. Frieze,et al.  Fast monte-carlo algorithms for finding low-rank approximations , 2004, JACM.

[22]  Jacek Tabor,et al.  Lossy compression approach to subspace clustering , 2018, Inf. Sci..

[23]  Weihua Li,et al.  Recursive PCA for adaptive process monitoring , 1999 .

[24]  M. Sacchi,et al.  A Randomized SVD For Multichannel Singular Spectrum Analysis (MSSA) Noise Attenuation , 2010 .

[25]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[26]  Moses Charikar,et al.  Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[27]  Per-Gunnar Martinsson,et al.  Randomized algorithms for the low-rank approximation of matrices , 2007, Proceedings of the National Academy of Sciences.

[28]  Michael Gastpar,et al.  The distributed Karhunen-Loeve transform , 2002, 2002 IEEE Workshop on Multimedia Signal Processing..