Low rank approximation and regression in input sparsity time

We design a new distribution over poly(r ε<sup>-1</sup>) x n matrices S so that for any fixed n x d matrix A of rank r, with probability at least 9/10, SAx<sub>2</sub> = (1 pm ε)Ax<sub>2</sub> simultaneously for all x ∈ R<sup>d</sup>. Such a matrix S is called a <i>subspace embedding</i>. Furthermore, SA can be computed in O(nnz(A)) + ~O(r<sup>2</sup>ε<sup>-2</sup>) time, where nnz(A) is the number of non-zero entries of A. This improves over all previous subspace embeddings, which required at least Ω(nd log d) time to achieve this property. We call our matrices S <i>sparse embedding matrices</i>. Using our sparse embedding matrices, we obtain the fastest known algorithms for overconstrained least-squares regression, low-rank approximation, approximating all leverage scores, and l<sub>p</sub>-regression: to output an x' for which Ax'-b<sub>2</sub> ≤ (1+ε)min<sub>x</sub> Ax-b<sub>2</sub> for an n x d matrix A and an n x 1 column vector b, we obtain an algorithm running in O(nnz(A)) + ~O(d<sup>3</sup>ε<sup>-2</sup>) time, and another in O(nnz(A)log(1/ε)) + ~O(d<sup>3</sup>log(1/ε)) time. (Here ~O(f) = f ⋅ log<sup>O(1)</sup>(f).) to obtain a decomposition of an n x n matrix A into a product of an n x k matrix L, a k x k diagonal matrix D, and a n x k matrix W, for which F{A - L D W} ≤ (1+ε)F{A-A<sub>k</sub>}, where A<sub>k</sub> is the best rank-k approximation, our algorithm runs in O(nnz(A)) + ~O(nk<sup>2</sup> ε<sup>-4</sup>log n + k<sup>3</sup>ε<sup>-5</sup>log<sup>2</sup>n) time. to output an approximation to all leverage scores of an n x d input matrix A simultaneously, with constant relative error, our algorithms run in O(nnz(A) log n) + ~O(r<sup>3</sup>) time. to output an x' for which Ax'-b<sub>p</sub> ≤ (1+ε)min<sub>x</sub> Ax-b<sub>p</sub> for an n x d matrix A and an n x 1 column vector b, we obtain an algorithm running in O(nnz(A) log n) + poly(r ε<sup>-1</sup>) time, for any constant 1 ≤ p < ∞. We optimize the polynomial factors in the above stated running times, and show various tradeoffs. Finally, we provide preliminary experimental results which suggest that our algorithms are of interest in practice.

[1]  F. T. Wright,et al.  A Bound on Tail Probabilities for Quadratic Forms in Independent Random Variables , 1971 .

[2]  M. Rudelson Random Vectors in the Isotropic Position , 1996, math/9608208.

[3]  L. Trefethen,et al.  Numerical linear algebra , 1997 .

[4]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[5]  Santosh S. Vempala,et al.  Latent Semantic Indexing , 2000, PODS 2000.

[6]  Frank McSherry,et al.  Spectral partitioning of random graphs , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[7]  Anna R. Karlin,et al.  Spectral analysis of data , 2001, STOC '01.

[8]  Amos Fiat,et al.  Web search via hub synthesis , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[9]  Prabhakar Raghavan,et al.  Competitive recommendation systems , 2002, STOC '02.

[10]  Alan M. Frieze,et al.  Fast monte-carlo algorithms for finding low-rank approximations , 2004, JACM.

[11]  Alan M. Frieze,et al.  Clustering Large Graphs via the Singular Value Decomposition , 2004, Machine Learning.

[12]  Moses Charikar,et al.  Finding frequent items in data streams , 2004, Theor. Comput. Sci..

[13]  Mikkel Thorup,et al.  Tabulation based 4-universal hashing with applications to second moment estimation , 2004, SODA '04.

[14]  Dimitris Achlioptas,et al.  On Spectral Learning of Mixtures of Distributions , 2005, COLT.

[15]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[16]  Tamás Sarlós,et al.  Improved Approximation Algorithms for Large Matrices via Random Projections , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[17]  Petros Drineas,et al.  FAST MONTE CARLO ALGORITHMS FOR MATRICES II: COMPUTING A LOW-RANK APPROXIMATION TO A MATRIX∗ , 2004 .

[18]  S. Vempala,et al.  Matrix approximation and projective clustering via volume sampling , 2006, ACM-SIAM Symposium on Discrete Algorithms.

[19]  Petros Drineas,et al.  FAST MONTE CARLO ALGORITHMS FOR MATRICES III: COMPUTING A COMPRESSED APPROXIMATE MATRIX DECOMPOSITION∗ , 2004 .

[20]  S. Muthukrishnan,et al.  Subspace Sampling and Relative-Error Matrix Approximation: Column-Based Methods , 2006, APPROX-RANDOM.

[21]  Sanjeev Arora,et al.  A Fast Random Sampling Algorithm for Sparsifying Matrices , 2006, APPROX-RANDOM.

[22]  S. Muthukrishnan,et al.  Sampling algorithms for l2 regression and applications , 2006, SODA '06.

[23]  Petros Drineas,et al.  Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication , 2006, SIAM J. Comput..

[24]  Santosh S. Vempala,et al.  Adaptive Sampling and Fast Low-Rank Matrix Approximation , 2006, APPROX-RANDOM.

[25]  Mark Rudelson,et al.  Sampling from large matrices: An approach through geometric functional analysis , 2005, JACM.

[26]  Dimitris Achlioptas,et al.  Fast computation of low-rank matrix approximations , 2007, JACM.

[27]  Santosh S. Vempala,et al.  The Spectral Method for General Mixture Models , 2008, SIAM J. Comput..

[28]  Anirban Dasgupta,et al.  Sampling algorithms and coresets for ℓp regression , 2007, SODA '08.

[29]  Trac D. Tran,et al.  A fast and efficient algorithm for low-rank approximation of a matrix , 2009, STOC '09.

[30]  David P. Woodruff,et al.  Numerical linear algebra in the streaming model , 2009, STOC '09.

[31]  Anirban Dasgupta,et al.  A sparse Johnson: Lindenstrauss transform , 2010, STOC '10.

[32]  David P. Woodruff,et al.  Fast Manhattan sketches in data streams , 2010, PODS '10.

[33]  Avner Magen,et al.  Low rank matrix-valued chernoff bounds and approximate matrix multiplication , 2010, SODA '11.

[34]  S. Muthukrishnan,et al.  Faster least squares approximation , 2007, Numerische Mathematik.

[35]  David P. Woodruff,et al.  Fast moment estimation in data streams in optimal space , 2010, STOC '11.

[36]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[37]  Benjamin Recht,et al.  A Simpler Approach to Matrix Completion , 2009, J. Mach. Learn. Res..

[38]  Fast matrix rank algorithms and applications , 2012, STOC '12.

[39]  Gary L. Miller,et al.  Iterative Approaches to Row Sampling , 2012, ArXiv.

[40]  David P. Woodruff,et al.  Fast approximation of matrix coherence and statistical leverage , 2011, ICML.

[41]  Anastasios Zouzias,et al.  A Matrix Hyperbolic Cosine Algorithm and Applications , 2011, ICALP.

[42]  Huy L. Nguyen,et al.  OSNAP: Faster Numerical Linear Algebra Algorithms via Sparser Subspace Embeddings , 2012, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[43]  Nikolaos M. Freris,et al.  Randomized Extended Kaczmarz for Solving Least Squares , 2012, SIAM J. Matrix Anal. Appl..

[44]  Michael W. Mahoney,et al.  Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression , 2012, STOC '13.

[45]  Christos Boutsidis,et al.  Random Projections for Linear Support Vector Machines , 2012, TKDD.

[46]  Daniel M. Kane,et al.  Sparser Johnson-Lindenstrauss Transforms , 2010, JACM.