论文信息 - Adaptive Sampling and Fast Low-Rank Matrix Approximation

Adaptive Sampling and Fast Low-Rank Matrix Approximation

We prove that any real matrix A contains a subset of at most 4k/e+ 2k log(k+1) rows whose span “contains” a matrix of rank at most k with error only (1+e) times the error of the best rank-k approximation of A. We complement it with an almost matching lower bound by constructing matrices where the span of any k/2e rows does not “contain” a relative (1+e)-approximation of rank k. Our existence result leads to an algorithm that finds such rank-k approximation in time $ O \left( M \left( \frac{k}{\epsilon} + k^{2} \log k \right) + (m+n) \left( \frac{k^{2}}{\epsilon^{2}} + \frac{k^{3} \log k}{\epsilon} + k^{4} \log^{2} k \right) \right), $ i.e., essentially O(Mk/e), where M is the number of nonzero entries of A. The algorithm maintains sparsity, and in the streaming model [12,14,15], it can be implemented using only 2(k+1)(log(k+1)+1) passes over the input matrix and $O \left( \min \{ m, n \} (\frac{k}{\epsilon} + k^{2} \log k) \right)$ additional space. Previous algorithms for low-rank approximation use only one or two passes but obtain an additive approximation.

Santosh S. Vempala | Amit Deshpande | S. Vempala | A. Deshpande

[1] Prabhakar Raghavan,et al. Computing on data streams , 1999, External Memory Algorithms.

[2] Alan M. Frieze,et al. Fast Monte-Carlo algorithms for finding low-rank approximations , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[3] Alan M. Frieze,et al. Clustering in large graphs and matrices , 1999, SODA '99.

[4] Philip S. Yu,et al. Fast algorithms for projected clustering , 1999, SIGMOD '99.

[5] Jirí Matousek,et al. On Approximate Geometric k -Clustering , 2000, Discret. Comput. Geom..

[6] Dimitris Achlioptas,et al. Fast computation of low rank matrix approximations , 2001, STOC '01.

[7] Sudipto Guha,et al. Data-streams and histograms , 2001, STOC '01.

[8] Ziv Bar-Yossef,et al. Sampling lower bounds via information theory , 2003, STOC '03.

[9] Marek Karpinski,et al. Approximation schemes for clustering problems , 2003, STOC '03.

[10] Petros Drineas,et al. Pass efficient algorithms for approximating large matrices , 2003, SODA '03.

[11] Joan Feigenbaum,et al. On graph problems in a semi-streaming model , 2005, Theor. Comput. Sci..

[12] Santosh S. Vempala,et al. Matrix approximation and projective clustering via volume sampling , 2006, SODA '06.

[13] Petros Drineas,et al. FAST MONTE CARLO ALGORITHMS FOR MATRICES II: COMPUTING A LOW-RANK APPROXIMATION TO A MATRIX∗ , 2004 .

[14] Sanjeev Arora,et al. A Fast Random Sampling Algorithm for Sparsifying Matrices , 2006, APPROX-RANDOM.