Improved Coresets for Clustering with Capacity and Fairness Constraints

We study coresets for clustering with capacity and fairness constraints. Our main result is a near-linear time algorithm to construct $\tilde{O}(k^2\varepsilon^{-2z-2})$-sized $\varepsilon$-coresets for capacitated $(k,z)$-clustering which improves a recent $\tilde{O}(k^3\varepsilon^{-3z-2})$ bound by [BCAJ+22, HJLW23]. As a corollary, we also save a factor of $k \varepsilon^{-z}$ on the coreset size for fair $(k,z)$-clustering compared to them. We fundamentally improve the hierarchical uniform sampling framework of [BCAJ+22] by adaptively selecting sample size on each ring instance, proportional to its clustering cost to an optimal solution. Our analysis relies on a key geometric observation that reduces the number of total ``effective centers"from [BCAJ+22]'s $\tilde{O}(k^2\varepsilon^{-z})$ to merely $O(k\log \varepsilon^{-1})$ by being able to ``ignore'' all center points that are too far or too close to the ring center.

[1]  S. Jiang,et al.  Coresets for Clustering with General Assignment Constraints , 2023, ArXiv.

[2]  Kasper Green Larsen,et al.  Improved Coresets for Euclidean k-Means , 2022, NeurIPS.

[3]  Jianing Lou,et al.  Near-optimal Coresets for Robust Clustering , 2022, ICLR.

[4]  Robert Krauthgamer,et al.  The Power of Uniform Sampling for Coresets , 2022, 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS).

[5]  Jason Li,et al.  On the Fixed-Parameter Tractability of Capacitated Clustering , 2022, ICALP.

[6]  Kasper Green Larsen,et al.  Towards optimal lower bounds for k-median and k-means coresets , 2022, STOC.

[7]  Qilong Feng,et al.  New Approximation Algorithms for Fair k-median Problem , 2022, ArXiv.

[8]  David Saulpic,et al.  A new coreset framework for clustering , 2021, STOC.

[9]  Fedor V. Fomin,et al.  On Coresets for Fair Clustering in Metric and Euclidean Spaces and Their Applications , 2020, ICALP.

[10]  W. Tai Optimal Coreset for Gaussian Kernel Density Estimation , 2020, SoCG.

[11]  Robert Krauthgamer,et al.  Coresets for Clustering in Excluded-minor Graphs and Beyond , 2020, SODA.

[12]  Nisheeth K. Vishnoi,et al.  Coresets for clustering in Euclidean spaces: importance sampling is nearly optimal , 2020, STOC.

[13]  W. Tai,et al.  Near-Optimal Coresets of Kernel Density Estimates , 2019, Discrete & Computational Geometry.

[14]  Robert Krauthgamer,et al.  Coresets for Clustering in Graphs of Bounded Treewidth , 2019, ICML.

[15]  Nisheeth K. Vishnoi,et al.  Coresets for Clustering with Fairness Constraints , 2019, NeurIPS.

[16]  Dan Feldman,et al.  Coresets for Gaussian Mixture Models of Any Shape , 2019, ArXiv.

[17]  Deeparnab Chakrabarty,et al.  Fair Algorithms for Clustering , 2019, NeurIPS.

[18]  Melanie Schmidt,et al.  Fair Coresets and Streaming Algorithms for Fair k-means , 2019, WAOA.

[19]  Vincent Cohen-Addad,et al.  Approximation Schemes for Capacitated Clustering in Doubling Metrics , 2018, SODA.

[20]  Jaroslaw Byrka,et al.  Constant factor FPT approximation for capacitated k-median , 2018, ESA.

[21]  David P. Woodruff,et al.  Strong Coresets for k-Median and Subspace Approximation: Goodbye Dimension , 2018, 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS).

[22]  David P. Woodruff,et al.  On Coresets for Logistic Regression , 2018, NeurIPS.

[23]  Jian Li,et al.  Epsilon-Coresets for Clustering (with Outliers) in Doubling Metrics , 2018, 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS).

[24]  Silvio Lattanzi,et al.  Fair Clustering Through Fairlets , 2018, NIPS.

[25]  Jeff M. Phillips,et al.  Near-Optimal Coresets of Kernel Density Estimates , 2018, Discrete & Computational Geometry.

[26]  Andreas Krause,et al.  Training Gaussian Mixture Models at Scale via Coresets , 2017, J. Mach. Learn. Res..

[27]  Shi Li,et al.  Approximating capacitated k-median with (1 + ∊)k open facilities , 2014, SODA.

[28]  Shi Li,et al.  On Uniform Capacitated k-Median Beyond the Natural LP Relaxation , 2014, SODA.

[29]  Dan Feldman,et al.  Turning big data into tiny data: Constant-size coresets for k-means, PCA and projective clustering , 2013, SODA.

[30]  Michael Langberg,et al.  A unified framework for approximating and clustering data , 2011, STOC.

[31]  Ke Chen,et al.  On Coresets for k-Median and k-Means Clustering in Metric and Euclidean Spaces and Their Applications , 2009, SIAM J. Comput..

[32]  Sudipto Guha,et al.  A constant-factor approximation algorithm for the k-median problem (extended abstract) , 1999, STOC '99.

[33]  Satish Rao,et al.  Approximation schemes for Euclidean k-medians and related problems , 1998, STOC '98.

[34]  Xuan Wu,et al.  Towards Optimal Coreset Construction for (k, z)-Clustering: Breaking the Quadratic Dependency on k , 2022, ArXiv.

[35]  Sariel Har-Peled,et al.  Smaller Coresets for K-median and K-means Clustering , 2005 .