Smaller Coresets for k-Median and k-Means Clustering

In this paper, we show that there exists a (k, ε)-coreset for k-median and k-means clustering of n points in Rd, which is of size independent of n. In particular, we construct a (k, ε)-coreset of size O(k2/εd) for k-median clustering, and of size O(k3/εd+1) for k-means clustering.

[1]  J. Davenport Editor , 1960 .

[2]  Editors , 1986, Brain Research Bulletin.

[3]  Mary Inaba,et al.  Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering: (extended abstract) , 1994, SCG '94.

[4]  M. Inaba Application of weighted Voronoi diagrams and randomization to variance-based k-clustering , 1994, SoCG 1994.

[5]  Michael R. Fellows,et al.  FIXED-PARAMETER TRACTABILITY AND COMPLETENESS , 2022 .

[6]  Sanjeev Arora,et al.  Polynomial time approximation schemes for Euclidean TSP and other geometric problems , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[7]  Satish Rao,et al.  Approximation schemes for Euclidean k-medians and related problems , 1998, STOC '98.

[8]  Sanjeev Arora,et al.  Polynomial time approximation schemes for Euclidean traveling salesman and other geometric problems , 1998, JACM.

[9]  Satish Rao,et al.  A Nearly Linear-Time Approximation Scheme for the Euclidean kappa-median Problem , 1999, ESA.

[10]  J. Matou On Approximate Geometric K-clustering , 1999 .

[11]  Sudipto Guha,et al.  A constant-factor approximation algorithm for the k-median problem (extended abstract) , 1999, STOC '99.

[12]  Sudipto Guha,et al.  Clustering Data Streams , 2000, FOCS.

[13]  Piotr Indyk,et al.  Approximate clustering via core-sets , 2002, STOC '02.

[14]  Jiri Matousek,et al.  Lectures on discrete geometry , 2002, Graduate texts in mathematics.

[15]  Sudipto Guha,et al.  A constant-factor approximation algorithm for the k-median problem (extended abstract) , 1999, STOC '99.

[16]  David M. Mount,et al.  A local search approximation algorithm for k-means clustering , 2002, SCG '02.

[17]  Marek Karpinski,et al.  Approximation schemes for clustering problems , 2003, STOC '03.

[18]  Sariel Har-Peled,et al.  On coresets for k-means and k-median clustering , 2004, STOC '04.

[19]  C. Greg Plaxton,et al.  Optimal Time Bounds for Approximate Clustering , 2002, Machine Learning.

[20]  Michelle Effros,et al.  Deterministic clustering with data nets , 2004, Electron. Colloquium Comput. Complex..

[21]  Sariel Har-Peled,et al.  Coresets for $k$-Means and $k$-Median Clustering and their Applications , 2018, STOC 2004.

[22]  Amit Kumar,et al.  Linear Time Algorithms for Clustering Problems in Any Dimensions , 2005, ICALP.

[23]  Sariel Har-Peled,et al.  How Fast Is the k-Means Method? , 2005, SODA '05.

[24]  Satish Rao,et al.  A Nearly Linear-Time Approximation Scheme for the Euclidean k-Median Problem , 2007, SIAM J. Comput..