Bi-criteria linear-time approximations for generalized k-mean/median/center

We consider the problem of approximating a set P of n points in Rd by a collection of j-dimensional flats, andextensions thereof, under the standard median / mean / centermeasures, in which we wish to minimize, respectively, the sum of thedistances from each point of P to its nearest flat, the sum of thesquares of these distances, or the maximal such distance.Such problems cannot be approximated unless P=NP but do allowbi-criteria approximations where one allows some leeway in both the numberof flats and the quality of the objective function.We give a very simple bi-criteria approximation algorithm, which producesat most α(k,j,n) = (k j log n)O(j) flats, which exceeds the optimalobjective value for any k j-dimensional flats by a factor of nomore than β(j)= 2O(j). Given this bi-criteria approximation, wecan use it to reduce the approximation factor arbitrarily, at the costof increasing the number of flats. Our algorithm hasmany advantages over previous work, in that it is muchmore widely applicable (wider set of objective functions and classes ofclusters) and much more efficient -- reducing the running time bound from O(n Poly(k,j)) to nd · (jk)O(j).Our algorithm is randomized and successful with probability 1/2(easily boosted to probabilities arbitrary close to 1).

[1]  Santosh S. Vempala,et al.  Matrix approximation and projective clustering via volume sampling , 2006, SODA '06.

[2]  Piotr Indyk,et al.  Sublinear time algorithms for metric space problems , 1999, STOC '99.

[3]  Kasturi R. Varadarajan,et al.  Geometric Approximation via Coresets , 2007 .

[4]  Pankaj K. Agarwal,et al.  Approximation Algorithms for k-Line Center , 2002, ESA.

[5]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[6]  Tamás Sarlós,et al.  Improved Approximation Algorithms for Large Matrices via Random Projections , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[7]  Sariel Har-Peled,et al.  Projective clustering in high dimensions using core-sets , 2002, SCG '02.

[8]  Philip S. Yu,et al.  Fast algorithms for projected clustering , 1999, SIGMOD '99.

[9]  Amos Fiat,et al.  Coresets forWeighted Facilities and Their Applications , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[10]  Kasturi R. Varadarajan,et al.  Efficient Subspace Approximation Algorithms , 2007, Discrete & Computational Geometry.

[11]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[12]  Rina Panigrahy,et al.  Minimum Enclosing Polytope in High Dimensions , 2004, ArXiv.

[13]  Sariel Har-Peled,et al.  On coresets for k-means and k-median clustering , 2004, STOC '04.

[14]  Santosh S. Vempala,et al.  Adaptive Sampling and Fast Low-Rank Matrix Approximation , 2006, APPROX-RANDOM.

[15]  Nabil H. Mustafa,et al.  k-means projective clustering , 2004, PODS.

[16]  Tamal K. Dey,et al.  Improved Bounds for Planar k -Sets and Related Problems , 1998, Discret. Comput. Geom..

[17]  Pankaj K. Agarwal,et al.  Approximation algorithms for projective clustering , 2000, SODA '00.

[18]  Jerzy W. Jaromczyk,et al.  The Two-Line Center Problem from a Polar View: A New Algorithm and Data Structure , 1995, WADS.

[19]  T. M. Murali,et al.  A Monte Carlo algorithm for fast projective clustering , 2002, SIGMOD '02.

[20]  Dan Feldman Coresets for Weighted Facilities and Their Applications , 2006 .

[21]  Philip S. Yu,et al.  Finding generalized projected clusters in high dimensional spaces , 2000, SIGMOD '00.

[22]  Sariel Har-Peled Clustering Motion , 2004, Discret. Comput. Geom..

[23]  N. Megiddo,et al.  Finding Least-Distances Lines , 1983 .