Low rank approximation and regression in input sparsity time

We design a new distribution over m × n matrices S so that, for any fixed n × d matrix A of rank r, with probability at least 9/10, ∥SAx∥2 = (1 ± ε)∥Ax∥2 simultaneously for all x ∈ Rd. Here, m is bounded by a polynomial in rε− 1, and the parameter ε ∈ (0, 1]. Such a matrix S is called a subspace embedding. Furthermore, SA can be computed in O(nnz(A)) time, where nnz(A) is the number of nonzero entries of A. This improves over all previous subspace embeddings, for which computing SA required at least Ω(ndlog d) time. We call these S sparse embedding matrices. Using our sparse embedding matrices, we obtain the fastest known algorithms for overconstrained least-squares regression, low-rank approximation, approximating all leverage scores, and ℓp regression. More specifically, let b be an n × 1 vector, ε > 0 a small enough value, and integers k, p ⩾ 1. Our results include the following. —Regression: The regression problem is to find d × 1 vector x′ for which ∥Ax′ − b∥p ⩽ (1 + ε)min x∥Ax − b∥p. For the Euclidean case p = 2, we obtain an algorithm running in O(nnz(A)) + Õ(d3ε −2) time, and another in O(nnz(A)log(1/ε)) + Õ(d3 log (1/ε)) time. (Here, Õ(f) = f ċ log O(1)(f).) For p ∈ [1, ∞), more generally, we obtain an algorithm running in O(nnz(A) log n) + O(r\ε −1)C time, for a fixed C. —Low-rank approximation: We give an algorithm to obtain a rank-k matrix Âk such that ∥A − Âk∥F ≤ (1 + ε )∥ A − Ak∥F, where Ak is the best rank-k approximation to A. (That is, Ak is the output of principal components analysis, produced by a truncated singular value decomposition, useful for latent semantic indexing and many other statistical problems.) Our algorithm runs in O(nnz(A)) + Õ(nk2ε−4 + k3ε−5) time. —Leverage scores: We give an algorithm to estimate the leverage scores of A, up to a constant factor, in O(nnz(A)log n) + Õ(r3)time.

[1]  Alan M. Frieze,et al.  Clustering Large Graphs via the Singular Value Decomposition , 2004, Machine Learning.

[2]  Frank McSherry,et al.  Spectral partitioning of random graphs , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[3]  Michael W. Mahoney,et al.  Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression , 2012, STOC '13.

[4]  Dimitris Achlioptas,et al.  Fast computation of low rank matrix approximations , 2001, STOC '01.

[5]  Christos Boutsidis,et al.  Improved Matrix Algorithms via the Subsampled Randomized Hadamard Transform , 2012, SIAM J. Matrix Anal. Appl..

[6]  Anastasios Zouzias,et al.  A Matrix Hyperbolic Cosine Algorithm and Applications , 2011, ICALP.

[7]  Lap Chi Lau,et al.  Fast matrix rank algorithms and applications , 2012, JACM.

[8]  Petros Drineas,et al.  Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication , 2006, SIAM J. Comput..

[9]  Mikkel Thorup,et al.  Tabulation based 4-universal hashing with applications to second moment estimation , 2004, SODA '04.

[10]  S. Muthukrishnan,et al.  Faster least squares approximation , 2007, Numerische Mathematik.

[11]  M. Rudelson Random Vectors in the Isotropic Position , 1996, math/9608208.

[12]  Bernard Chazelle,et al.  Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform , 2006, STOC '06.

[13]  Anirban Dasgupta,et al.  A sparse Johnson: Lindenstrauss transform , 2010, STOC '10.

[14]  Alan M. Frieze,et al.  Fast Monte-Carlo algorithms for finding low-rank approximations , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[15]  David P. Woodruff,et al.  Subspace Embeddings for the Polynomial Kernel , 2014, NIPS.

[16]  Avner Magen,et al.  Low rank matrix-valued chernoff bounds and approximate matrix multiplication , 2010, SODA '11.

[17]  U. Haagerup The best constants in the Khintchine inequality , 1981 .

[18]  Dimitris Achlioptas,et al.  On Spectral Learning of Mixtures of Distributions , 2005, COLT.

[19]  Nikolaos M. Freris,et al.  Randomized Extended Kaczmarz for Solving Least Squares , 2012, SIAM J. Matrix Anal. Appl..

[20]  Santosh S. Vempala,et al.  Adaptive Sampling and Fast Low-Rank Matrix Approximation , 2006, APPROX-RANDOM.

[21]  Petros Drineas,et al.  FAST MONTE CARLO ALGORITHMS FOR MATRICES III: COMPUTING A COMPRESSED APPROXIMATE MATRIX DECOMPOSITION∗ , 2004 .

[22]  Santosh S. Vempala,et al.  The Spectral Method for General Mixture Models , 2008, SIAM J. Comput..

[23]  Sanjeev Arora,et al.  A Fast Random Sampling Algorithm for Sparsifying Matrices , 2006, APPROX-RANDOM.

[24]  Daniel M. Kane,et al.  A Sparser Johnson-Lindenstrauss Transform , 2010, ArXiv.

[25]  Petros Drineas,et al.  FAST MONTE CARLO ALGORITHMS FOR MATRICES II: COMPUTING A LOW-RANK APPROXIMATION TO A MATRIX∗ , 2004 .

[26]  Santosh S. Vempala,et al.  Matrix approximation and projective clustering via volume sampling , 2006, SODA '06.

[27]  David P. Woodruff,et al.  Subspace Embeddings and p-Regression Using Exponential Random Variables , 2013, COLT.

[28]  Moses Charikar,et al.  Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[29]  Anna R. Karlin,et al.  Spectral analysis of data , 2001, STOC '01.

[30]  Michael W. Mahoney,et al.  Quantile Regression for Large-Scale Applications , 2013, SIAM J. Sci. Comput..

[31]  Anirban Dasgupta,et al.  Sampling algorithms and coresets for ℓp regression , 2007, SODA '08.

[32]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[33]  Michael A. Saunders,et al.  LSRN: A Parallel Iterative Solver for Strongly Over- or Underdetermined Systems , 2011, SIAM J. Sci. Comput..

[34]  S. Muthukrishnan,et al.  Subspace Sampling and Relative-Error Matrix Approximation: Column-Based Methods , 2006, APPROX-RANDOM.

[35]  David P. Woodruff,et al.  Sketching Structured Matrices for Faster Nonlinear Regression , 2013, NIPS.

[36]  Amos Fiat,et al.  Web search via hub synthesis , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[37]  Daniel M. Kane,et al.  Sparser Johnson-Lindenstrauss Transforms , 2010, JACM.

[38]  C. Fombrun,et al.  Matrix , 1979, Encyclopedic Dictionary of Archaeology.

[39]  David P. Woodruff Sketching as a Tool for Numerical Linear Algebra , 2014, Found. Trends Theor. Comput. Sci..

[40]  Gary L. Miller,et al.  Iterative Approaches to Row Sampling , 2012, ArXiv.

[41]  David P. Woodruff,et al.  Fast approximation of matrix coherence and statistical leverage , 2011, ICML.

[42]  Tamás Sarlós,et al.  Improved Approximation Algorithms for Large Matrices via Random Projections , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[43]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[44]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[45]  Christos Boutsidis,et al.  Random Projections for Support Vector Machines , 2012, AISTATS.

[46]  L. Trefethen,et al.  Numerical linear algebra , 1997 .

[47]  Santosh S. Vempala,et al.  Latent Semantic Indexing , 2000, PODS 2000.

[48]  S. Muthukrishnan,et al.  Subspace Sampling and Relative-Error Matrix Approximation: Column-Row-Based Methods , 2006, ESA.

[49]  H. En Lower Bounds for Oblivious Subspace Embeddings , 2013 .

[50]  Mark Rudelson,et al.  Sampling from large matrices: An approach through geometric functional analysis , 2005, JACM.

[51]  Prabhakar Raghavan,et al.  Competitive recommendation systems , 2002, STOC '02.

[52]  Rasmus Pagh,et al.  Compressed matrix multiplication , 2011, ITCS '12.

[53]  Christos Boutsidis,et al.  Random Projections for Linear Support Vector Machines , 2012, TKDD.

[54]  David P. Woodruff,et al.  Numerical linear algebra in the streaming model , 2009, STOC '09.

[55]  Alan M. Frieze,et al.  Fast monte-carlo algorithms for finding low-rank approximations , 2004, JACM.

[56]  David P. Woodruff,et al.  Fast moment estimation in data streams in optimal space , 2010, STOC '11.

[57]  Trac D. Tran,et al.  A fast and efficient algorithm for low-rank approximation of a matrix , 2009, STOC '09.

[58]  Dan Suciu,et al.  Journal of the ACM , 2006 .

[59]  F. T. Wright,et al.  A Bound on Tail Probabilities for Quadratic Forms in Independent Random Variables , 1971 .

[60]  Philipp Birken,et al.  Numerical Linear Algebra , 2011, Encyclopedia of Parallel Computing.

[61]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[62]  Huy L. Nguyen,et al.  OSNAP: Faster Numerical Linear Algebra Algorithms via Sparser Subspace Embeddings , 2012, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[63]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[64]  David G. Luenberger,et al.  Linear and nonlinear programming , 1984 .

[65]  S. Muthukrishnan,et al.  Sampling algorithms for l2 regression and applications , 2006, SODA '06.

[66]  Santosh S. Vempala,et al.  Latent semantic indexing: a probabilistic analysis , 1998, PODS '98.

[67]  David P. Woodruff,et al.  Fast Manhattan sketches in data streams , 2010, PODS '10.

[68]  Huy L. Nguyen,et al.  Sparsity lower bounds for dimensionality reducing maps , 2012, STOC '13.

[69]  Benjamin Recht,et al.  A Simpler Approach to Matrix Completion , 2009, J. Mach. Learn. Res..