Adaptive Sampling and Fast Low-Rank Matrix Approximation

We prove that any real matrix A contains a subset of at most 4k/e+ 2k log(k+1) rows whose span “contains” a matrix of rank at most k with error only (1+e) times the error of the best rank-k approximation of A. We complement it with an almost matching lower bound by constructing matrices where the span of any k/2e rows does not “contain” a relative (1+e)-approximation of rank k. Our existence result leads to an algorithm that finds such rank-k approximation in time $ O \left( M \left( \frac{k}{\epsilon} + k^{2} \log k \right) + (m+n) \left( \frac{k^{2}}{\epsilon^{2}} + \frac{k^{3} \log k}{\epsilon} + k^{4} \log^{2} k \right) \right), $ i.e., essentially O(Mk/e), where M is the number of nonzero entries of A. The algorithm maintains sparsity, and in the streaming model [12,14,15], it can be implemented using only 2(k+1)(log(k+1)+1) passes over the input matrix and $O \left( \min \{ m, n \} (\frac{k}{\epsilon} + k^{2} \log k) \right)$ additional space. Previous algorithms for low-rank approximation use only one or two passes but obtain an additive approximation.

[1]  Prabhakar Raghavan,et al.  Computing on data streams , 1999, External Memory Algorithms.

[2]  Alan M. Frieze,et al.  Fast Monte-Carlo algorithms for finding low-rank approximations , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[3]  Alan M. Frieze,et al.  Clustering in large graphs and matrices , 1999, SODA '99.

[4]  Philip S. Yu,et al.  Fast algorithms for projected clustering , 1999, SIGMOD '99.

[5]  Jirí Matousek,et al.  On Approximate Geometric k -Clustering , 2000, Discret. Comput. Geom..

[6]  Dimitris Achlioptas,et al.  Fast computation of low rank matrix approximations , 2001, STOC '01.

[7]  Sudipto Guha,et al.  Data-streams and histograms , 2001, STOC '01.

[8]  Ziv Bar-Yossef,et al.  Sampling lower bounds via information theory , 2003, STOC '03.

[9]  Marek Karpinski,et al.  Approximation schemes for clustering problems , 2003, STOC '03.

[10]  Petros Drineas,et al.  Pass efficient algorithms for approximating large matrices , 2003, SODA '03.

[11]  Joan Feigenbaum,et al.  On graph problems in a semi-streaming model , 2005, Theor. Comput. Sci..

[12]  Santosh S. Vempala,et al.  Matrix approximation and projective clustering via volume sampling , 2006, SODA '06.

[13]  Petros Drineas,et al.  Fast Monte Carlo Algorithms for Matrices II: Computing a Low-Rank Approximation to a Matrix , 2006, SIAM J. Comput..

[14]  Sanjeev Arora,et al.  A Fast Random Sampling Algorithm for Sparsifying Matrices , 2006, APPROX-RANDOM.