Coresets for Discrete Integration and Clustering

Given a set P of n points on the real line and a (potentially infinite) family of functions, we investigate the problem of finding a small (weighted) subset S C P, such that for any f ∈ F, we have that f(P) is a (1 ± e)-approximation to f(S). Here, f(Q) = Σ qeQ w(q)f(q) denotes the weighted discrete integral of f over the point set Q, where w(q) is the weight assigned to the point q. We study this problem, and provide tight bounds on the size S for several families of functions. As an application, we present some coreset constructions for clustering.

[1]  David Eppstein,et al.  Fast hierarchical clustering and other applications of dynamic closest pairs , 1999, SODA '98.

[2]  Kasturi R. Varadarajan,et al.  Geometric Approximation via Coresets , 2007 .

[3]  Dan Feldman Coresets for Weighted Facilities and Their Applications , 2006 .

[4]  Mary Inaba,et al.  Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering: (extended abstract) , 1994, SCG '94.

[5]  Piotr Indyk A sublinear time approximation scheme for clustering in metric spaces , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[6]  Pankaj K. Agarwal,et al.  Approximation algorithms for projective clustering , 2000, SODA '00.

[7]  Teofilo F. GONZALEZ,et al.  Clustering to Minimize the Maximum Intercluster Distance , 1985, Theor. Comput. Sci..

[8]  Amos Fiat,et al.  Coresets forWeighted Facilities and Their Applications , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[9]  Kenneth L. Clarkson,et al.  Smaller core-sets for balls , 2003, SODA '03.

[10]  Leonard Pitt,et al.  Sublinear time approximate clustering , 2001, SODA '01.

[11]  M. Inaba Application of weighted Voronoi diagrams and randomization to variance-based k-clustering , 1994, SoCG 1994.

[12]  Sariel Har-Peled How to get close to the median shape , 2007, Comput. Geom..

[13]  Noga Alon,et al.  Testing of Clustering , 2003, SIAM J. Discret. Math..

[14]  Sariel Har-Peled,et al.  On coresets for k-means and k-median clustering , 2004, STOC '04.

[15]  Sariel Har-Peled,et al.  Coresets for $k$-Means and $k$-Median Clustering and their Applications , 2018, STOC 2004.

[16]  Rafail Ostrovsky,et al.  Polynomial time approximation schemes for geometric k-clustering , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[17]  Ke Chen,et al.  On k-Median clustering in high dimensions , 2006, SODA '06.

[18]  David M. Mount,et al.  A local search approximation algorithm for k-means clustering , 2002, SCG '02.

[19]  Sariel Har-Peled,et al.  Smaller Coresets for k-Median and k-Means Clustering , 2005, SCG.

[20]  Tomás Feder,et al.  Optimal algorithms for approximate clustering , 1988, STOC '88.