Matrix approximation and projective clustering via volume sampling

Frieze et al. [17] proved that a small sample of rows of a given matrix <i>A</i> contains a low-rank approximation <i>D</i> that minimizes ||<i>A - D</i>||<i>F</i> to within small additive error, and the sampling can be done efficiently using just two passes over the matrix [12]. In this paper, we generalize this result in two ways. First, we prove that the additive error drops exponentially by iterating the sampling in an adaptive manner. Using this result, we give a pass-efficient algorithm for computing low-rank approximation with reduced additive error. Our second result is that using a natural distribution on subsets of rows (called <i>volume</i> sampling), there exists a subset of <i>k</i> rows whose span contains a factor (<i>k</i> + 1) relative approximation and a subset of <i>k</i> + <i>k</i>(<i>k</i> + 1)/ε rows whose span contains a 1+ε relative approximation. The existence of such a small certificate for multiplicative low-rank approximation leads to a PTAS for the following projective clustering problem: Given a set of points <i>P</i> in R<sup><i>d</i></sup>, and integers <i>k, j</i>, find a set of <i>j</i> subspaces <i>F</i><inf>1</inf>, . . ., <i>F</i><inf><i>j</i></inf>, each of dimension at most <i>k</i>, that minimize Σ<inf><i>p</i>∈P</inf>min<inf><i>i</i></inf> <i>d(p, F</i><inf><i>i</i></inf>)<sup>2</sup>.

[1]  Dimitris Achlioptas,et al.  Fast computation of low-rank matrix approximations , 2007, JACM.

[2]  Kasturi R. Varadarajan,et al.  Geometric Approximation via Coresets , 2007 .

[3]  Tamás Sarlós,et al.  Improved Approximation Algorithms for Large Matrices via Random Projections , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[4]  Santosh S. Vempala,et al.  Adaptive Sampling and Fast Low-Rank Matrix Approximation , 2006, APPROX-RANDOM.

[5]  Petros Drineas,et al.  FAST MONTE CARLO ALGORITHMS FOR MATRICES II: COMPUTING A LOW-RANK APPROXIMATION TO A MATRIX∗ , 2004 .

[6]  Sudipto Guha,et al.  Approximation and streaming algorithms for histogram construction problems , 2006, TODS.

[7]  Pankaj K. Agarwal,et al.  Approximation Algorithms for a k-Line Center , 2005, Algorithmica.

[8]  Luis Rademacher,et al.  Matrix Approximation and Projective Clustering via Iterative Sampling , 2005 .

[9]  Sariel Har-Peled,et al.  Coresets for $k$-Means and $k$-Median Clustering and their Applications , 2018, STOC 2004.

[10]  Joan Feigenbaum,et al.  On graph problems in a semi-streaming model , 2005, Theor. Comput. Sci..

[11]  Amit Kumar,et al.  A simple linear time (1 + /spl epsiv/)-approximation algorithm for k-means clustering in any dimensions , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[12]  Michelle Effros,et al.  Rapid near-optimal VQ design with a deterministic data net , 2004, International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings..

[13]  Alan M. Frieze,et al.  Clustering Large Graphs via the Singular Value Decomposition , 2004, Machine Learning.

[14]  Nabil H. Mustafa,et al.  k-means projective clustering , 2004, PODS.

[15]  Sariel Har-Peled,et al.  On coresets for k-means and k-median clustering , 2004, STOC '04.

[16]  David Kempe,et al.  A decentralized algorithm for spectral analysis , 2004, STOC '04.

[17]  Amit Kumar,et al.  A simple linear time ( 1+ ε)- approximation algorithm for geometric k-means clustering in any dimensions , 2004 .

[18]  Michelle Effros,et al.  Deterministic clustering with data nets , 2004, Electron. Colloquium Comput. Complex..

[19]  Ziv Bar-Yossef,et al.  Sampling lower bounds via information theory , 2003, STOC '03.

[20]  Marek Karpinski,et al.  Approximation schemes for clustering problems , 2003, STOC '03.

[21]  Petros Drineas,et al.  Pass efficient algorithms for approximating large matrices , 2003, SODA '03.

[22]  Pankaj K. Agarwal,et al.  Approximation Algorithms for k-Line Center , 2002, ESA.

[23]  Sariel Har-Peled,et al.  Projective clustering in high dimensions using core-sets , 2002, SCG '02.

[24]  T. M. Murali,et al.  A Monte Carlo algorithm for fast projective clustering , 2002, SIGMOD '02.

[25]  Piotr Indyk,et al.  Approximate clustering via core-sets , 2002, STOC '02.

[26]  Rafail Ostrovsky,et al.  Polynomial-time approximation schemes for geometric min-sum median clustering , 2002, JACM.

[27]  Sudipto Guha,et al.  Data-streams and histograms , 2001, STOC '01.

[28]  S. Goreinov,et al.  The maximum-volume concept in approximation by low-rank matrices , 2001 .

[29]  Jirí Matousek,et al.  On Approximate Geometric k -Clustering , 2000, Discret. Comput. Geom..

[30]  Philip S. Yu,et al.  Fast algorithms for projected clustering , 1999, SIGMOD '99.

[31]  Alan M. Frieze,et al.  Clustering in large graphs and matrices , 1999, SODA '99.

[32]  Prabhakar Raghavan,et al.  Computing on data streams , 1999, External Memory Algorithms.

[33]  Alan M. Frieze,et al.  Fast Monte-Carlo algorithms for finding low-rank approximations , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[34]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[35]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[36]  Henryk Wozniakowski,et al.  Estimating the Largest Eigenvalue by the Power and Lanczos Algorithms with a Random Start , 1992, SIAM J. Matrix Anal. Appl..

[37]  G. Golub Matrix computations , 1983 .

[38]  Nimrod Megiddo,et al.  On the complexity of locating linear facilities in the plane , 1982, Oper. Res. Lett..