Approximate clustering via core-sets

In this paper, we show that for several clustering problems one can extract a small set of points, so that using those core-sets enable us to perform approximate clustering efficiently. The surprising property of those core-sets is that their size is independent of the dimension.Using those, we present a (1+ ε)-approximation algorithms for the k-center clustering and k-median clustering problems in Euclidean space. The running time of the new algorithms has linear or near linear dependency on the number of points and the dimension, and exponential dependency on 1/ε and k. As such, our results are a substantial improvement over what was previously known.We also present some other clustering results including (1+ ε)-approximate 1-cylinder clustering, and k-center clustering with outliers.

[1]  David G. Stork,et al.  Pattern Classification , 1973 .

[2]  L. Lovász,et al.  Geometric Algorithms and Combinatorial Optimization , 1981 .

[3]  Teofilo F. GONZALEZ,et al.  Clustering to Minimize the Maximum Intercluster Distance , 1985, Theor. Comput. Sci..

[4]  D. Eppstein,et al.  Approximation algorithms for geometric problems , 1996 .

[5]  Satish Rao,et al.  Approximation schemes for Euclidean k-medians and related problems , 1998, STOC '98.

[6]  Pankaj K. Agarwal,et al.  Exact and Approximation Algortihms for Clustering , 1997 .

[7]  Rafail Ostrovsky,et al.  Polynomial time approximation schemes for geometric k-clustering , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[8]  Sariel Har-Peled,et al.  Approximate shape fitting via linearization , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[9]  Leonard Pitt,et al.  Sublinear time approximate clustering , 2001, SODA '01.

[10]  Piotr Indyk,et al.  Algorithmic applications of low-distortion geometric embeddings , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[11]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[12]  Ashish Goel,et al.  Reductions among high dimensional proximity problems , 2001, SODA '01.

[13]  Ryan O'Donnell,et al.  Derandomized dimensionality reduction with applications , 2002, SODA '02.

[14]  Noga Alon,et al.  Testing of Clustering , 2003, SIAM J. Discret. Math..

[15]  Sariel Har-Peled Clustering Motion , 2004, Discret. Comput. Geom..