Relative-Error CUR Matrix Decompositions

Many data analysis applications deal with large matrices and involve approximating the matrix using a small number of “components.” Typically, these components are linear combinations of the rows and columns of the matrix, and are thus difficult to interpret in terms of the original features of the input data. In this paper, we propose and study matrix approximations that are explicitly expressed in terms of a small number of columns and/or rows of the data matrix, and thereby more amenable to interpretation in terms of the original data. Our main algorithmic results are two randomized algorithms which take as input an $m\times n$ matrix $A$ and a rank parameter $k$. In our first algorithm, $C$ is chosen, and we let $A'=CC^+A$, where $C^+$ is the Moore-Penrose generalized inverse of $C$. In our second algorithm $C$, $U$, $R$ are chosen, and we let $A'=CUR$. ($C$ and $R$ are matrices that consist of actual columns and rows, respectively, of $A$, and $U$ is a generalized inverse of their intersection.) For each algorithm, we show that with probability at least $1-\delta$, $\|A-A'\|_F\leq(1+\epsilon)\,\|A-A_k\|_F$, where $A_k$ is the “best” rank-$k$ approximation provided by truncating the SVD of $A$, and where $\|X\|_F$ is the Frobenius norm of the matrix $X$. The number of columns of $C$ and rows of $R$ is a low-degree polynomial in $k$, $1/\epsilon$, and $\log(1/\delta)$. Both the Numerical Linear Algebra community and the Theoretical Computer Science community have studied variants of these matrix decompositions over the last ten years. However, our two algorithms are the first polynomial time algorithms for such low-rank matrix approximations that come with relative-error guarantees; previously, in some cases, it was not even known whether such matrix decompositions exist. Both of our algorithms are simple and they take time of the order needed to approximately compute the top $k$ singular vectors of $A$. The technical crux of our analysis is a novel, intuitive sampling method we introduce in this paper called “subspace sampling.” In subspace sampling, the sampling probabilities depend on the Euclidean norms of the rows of the top singular vectors. This allows us to obtain provable relative-error guarantees by deconvoluting “subspace” information and “size-of-$A$” information in the input matrix. This technique is likely to be useful for other matrix approximation and data analysis problems.

[1]  J. Davenport Editor , 1960 .

[2]  G. Golub,et al.  Linear least squares solutions by householder transformations , 1965 .

[3]  Adi Ben-Israel,et al.  Generalized inverses: theory and applications , 1974 .

[4]  B. Parlett The Symmetric Eigenvalue Problem , 1981 .

[5]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[6]  Per Christian Hansen,et al.  Computing Truncated Singular Value Decomposition Least Squares Solutions by Rank Revealing QR-Factorizations , 1990, SIAM J. Sci. Comput..

[7]  V. N. Bogaevski,et al.  Matrix Perturbation Theory , 1991 .

[8]  Per Christian Hansen,et al.  Some Applications of the Rank Revealing QR Factorization , 1992, SIAM J. Sci. Comput..

[9]  Ming Gu,et al.  Efficient Algorithms for Computing a Strong Rank-Revealing QR Factorization , 1996, SIAM J. Sci. Comput..

[10]  M. Rudelson Random Vectors in the Isotropic Position , 1996, math/9608208.

[11]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[12]  S. Goreinov,et al.  A Theory of Pseudoskeleton Approximations , 1997 .

[13]  Alan M. Frieze,et al.  Clustering in large graphs and matrices , 1999, SODA '99.

[14]  G. W. Stewart,et al.  Four algorithms for the the efficient computation of truncated pivoted QR approximations to a sparse matrix , 1999, Numerische Mathematik.

[15]  Coordinate Restrictions of Linear Operators in L , 2000 .

[16]  Christopher K. I. Williams,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[17]  Venkatesan Guruswami,et al.  Combinatorial feature selection problems , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[18]  Eugene E. Tyrtyshnikov,et al.  Incomplete Cross Approximation in the Mosaic-Skeleton Method , 2000, Computing.

[19]  S. Goreinov,et al.  The maximum-volume concept in approximation by low-rank matrices , 2001 .

[20]  Dimitris Achlioptas,et al.  Fast computation of low rank matrix approximations , 2001, STOC '01.

[21]  S. Schreiber,et al.  Vector algebra in the analysis of genome-wide expression data , 2002, Genome Biology.

[22]  Carl Edward Rasmussen,et al.  Observations on the Nyström Method for Gaussian Process Prediction , 2002 .

[23]  Nicolas Le Roux,et al.  Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering , 2003, NIPS.

[24]  R. Vershynin Approximation of matrices , 2003 .

[25]  Petros Drineas,et al.  Pass efficient algorithms for approximating large matrices , 2003, SODA '03.

[26]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[27]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Alan M. Frieze,et al.  Fast monte-carlo algorithms for finding low-rank approximations , 2004, JACM.

[29]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[30]  Pankaj K. Agarwal,et al.  Approximating extent measures of points , 2004, JACM.

[31]  R. Altman,et al.  Finding haplotype tagging SNPs by use of principal components analysis. , 2004, American journal of human genetics.

[32]  Kenneth Y. Goldberg,et al.  Eigentaste: A Constant Time Collaborative Filtering Algorithm , 2001, Information Retrieval.

[33]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[34]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[35]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[36]  Michael W. Berry,et al.  Algorithm 844: Computing sparse reduced-rank approximations to sparse matrices , 2005, TOMS.

[37]  G. W. Stewart,et al.  Error Analysis of the Quasi-Gram-Schmidt Algorithm , 2005, SIAM J. Matrix Anal. Appl..

[38]  Luis Rademacher,et al.  Matrix Approximation and Projective Clustering via Iterative Sampling , 2005 .

[39]  Michael W. Mahoney,et al.  Approximating a Gram Matrix for Improved Kernel-Based Learning (Extended Abstract) , 2005 .

[40]  Santosh S. Vempala,et al.  Matrix approximation and projective clustering via volume sampling , 2006, SODA '06.

[41]  V. Rokhlin,et al.  A randomized algorithm for the approximation of matrices , 2006 .

[42]  Tamás Sarlós,et al.  Improved Approximation Algorithms for Large Matrices via Random Projections , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[43]  S. Muthukrishnan,et al.  Subspace Sampling and Relative-Error Matrix Approximation: Column-Row-Based Methods , 2006, ESA.

[44]  Petros Drineas,et al.  FAST MONTE CARLO ALGORITHMS FOR MATRICES II: COMPUTING A LOW-RANK APPROXIMATION TO A MATRIX∗ , 2004 .

[45]  Petros Drineas,et al.  FAST MONTE CARLO ALGORITHMS FOR MATRICES III: COMPUTING A COMPRESSED APPROXIMATE MATRIX DECOMPOSITION∗ , 2004 .

[46]  S. Muthukrishnan,et al.  Subspace Sampling and Relative-Error Matrix Approximation: Column-Based Methods , 2006, APPROX-RANDOM.

[47]  Petros Drineas,et al.  Tensor-CUR decompositions for tensor-based data , 2006, KDD '06.

[48]  Bernard Chazelle,et al.  Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform , 2006, STOC '06.

[49]  S. Muthukrishnan,et al.  Sampling algorithms for l2 regression and applications , 2006, SODA '06.

[50]  Petros Drineas,et al.  Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication , 2006, SIAM J. Comput..

[51]  Santosh S. Vempala,et al.  Adaptive Sampling and Fast Low-Rank Matrix Approximation , 2006, APPROX-RANDOM.

[52]  Mark Rudelson,et al.  Sampling from large matrices: An approach through geometric functional analysis , 2005, JACM.

[53]  Michael W. Mahoney,et al.  A randomized algorithm for a tensor-based generalization of the singular value decomposition , 2007 .

[54]  Gene H. Golub,et al.  Numerical methods for solving linear least squares problems , 1965, Milestones in Matrix Computation.

[55]  Anirban Dasgupta,et al.  Feature selection methods for text classification , 2007, KDD '07.

[56]  Michael W. Mahoney,et al.  Intra- and interpopulation genotype reconstruction from tagging SNPs. , 2006, Genome research.