Optimal Fully Dynamic k-Centers Clustering

We present the first algorithm for fully dynamic k-centers clustering in an arbitrary metric space that maintains an optimal 2 + ǫ approximation in O(k · polylog(n,∆)) amortized update time. Here, n is an upper bound on the number of active points at any time, and ∆ is the aspect ratio of the data. Previously, the best known amortized update time was O(k · polylog(n,∆)), and is due to Chan, Gourqin, and Sozio [CGS18]. We demonstrate that the runtime of our algorithm is optimal up to polylog(n,∆) factors, even for insertion-only streams, which closes the complexity of fully dynamic k-centers clustering. In particular, we prove that any algorithm for k-clustering tasks in arbitrary metric spaces, including k-means, k-medians, and k-centers, must make at least Ω(nk) distance queries to achieve any non-trivial approximation factor. Despite the lower bound for arbitrary metrics, we demonstrate that an update time sublinear in k is possible for metric spaces which admit locally sensitive hash functions (LSH). Namely, we demonstrate a black-box transformation which takes a locally sensitive hash family for a metric space and produces a faster fully dynamic k-centers algorithm for that space. In particular, for a large class of metrics including Euclidean space, lp spaces, the Hamming Metric, and the Jaccard Metric, for any c > 1, our results yield a c(4 + ǫ) approximate k-centers solution in O(n ·polylog(n,∆)) amortized update time, simultaneously for all k ≥ 1. Previously, the only known comparable result was a O(c log n) approximation for Euclidean space due to Schmidt and Sohler, running in the same amortized update time [SS19].

[1]  Dan Feldman,et al.  Turning big data into tiny data: Constant-size coresets for k-means, PCA and projective clustering , 2013, SODA.

[2]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[3]  Haim Kaplan,et al.  Adversarially Robust Streaming Algorithms via Differential Privacy , 2020, NeurIPS.

[4]  Nikos Parotsidis,et al.  Fully Dynamic Consistent Facility Location , 2019, NeurIPS.

[5]  Vladimir Braverman,et al.  New Frameworks for Offline and Streaming Coreset Constructions , 2016, ArXiv.

[6]  C. Greg Plaxton,et al.  Optimal Time Bounds for Approximate Clustering , 2002, Machine Learning.

[7]  Tomás Feder,et al.  Optimal algorithms for approximate clustering , 1988, STOC '88.

[8]  Edith Cohen,et al.  A Framework for Adversarial Streaming via Differential Privacy and Difference Estimators , 2021, ArXiv.

[9]  Oded Goldreich,et al.  Introduction to Property Testing , 2017 .

[10]  Shiri Chechik,et al.  Fully Dynamic Maximal Independent Set in Expected Poly-Log Update Time , 2019, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS).

[11]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[12]  Krzysztof Onak,et al.  Fully Dynamic MIS in Uniformly Sparse Graphs , 2018, ICALP.

[13]  Piotr Indyk,et al.  Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality , 2012, Theory Comput..

[14]  Rajeev Motwani,et al.  Incremental clustering and dynamic information retrieval , 1997, STOC '97.

[15]  Yeshwanth Cherapanamjeri,et al.  On Adaptive Distance Estimation , 2020, NeurIPS.

[16]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[17]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[18]  Silvio Lattanzi,et al.  Consistent k-Clustering , 2017, ICML.

[19]  David B. Shmoys,et al.  A unified approach to approximation algorithms for bottleneck problems , 1986, JACM.

[20]  Andrew Y. Ng,et al.  Learning Feature Representations with K-Means , 2012, Neural Networks: Tricks of the Trade.

[21]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Krzysztof Onak,et al.  Fully Dynamic Maximal Independent Set with Sublinear in n Update Time , 2018, SODA.

[23]  Michael Langberg,et al.  A unified framework for approximating and clustering data , 2011, STOC.

[24]  Manoj Gupta,et al.  Simple dynamic algorithms for Maximal Independent Set and other problems , 2018, ArXiv.

[25]  Alexandr Andoni,et al.  Earth mover distance over high-dimensional spaces , 2008, SODA '08.

[26]  Teofilo F. GONZALEZ,et al.  Clustering to Minimize the Maximum Intercluster Distance , 1985, Theor. Comput. Sci..

[27]  Charu C. Aggarwal,et al.  Graph Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[28]  Keren Censor-Hillel,et al.  Optimal Dynamic Distributed MIS , 2015, PODC.

[29]  Samir Khuller,et al.  Streaming Algorithms for k-Center Clustering with Outliers and with Anonymity , 2008, APPROX-RANDOM.

[30]  Shi Li,et al.  Consistent k-Median: Simpler, Better and Robust , 2020, AISTATS.

[31]  Claire Mathieu,et al.  Dynamic Clustering to Minimize the Sum of Radii , 2017, Algorithmica.

[32]  Nisheeth K. Vishnoi,et al.  Coresets for clustering in Euclidean spaces: importance sampling is nearly optimal , 2020, STOC.

[33]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[34]  Mohammad Ghodsi,et al.  A Composable Coreset for k-Center in Doubling Metrics , 2019, CCCG.

[35]  Dariusz Leniowski,et al.  Fully Dynamic k-Center Clustering in Low Dimensional Metrics , 2021, ALENEX.

[36]  B. Jaumard,et al.  Cluster Analysis and Mathematical Programming , 2003 .

[37]  David P. Woodruff,et al.  A Framework for Adversarially Robust Streaming Algorithms , 2020, SIGMOD Rec..

[38]  Rafail Ostrovsky,et al.  Low distortion embeddings for edit distance , 2005, STOC '05.

[39]  Sariel Har-Peled,et al.  On coresets for k-means and k-median clustering , 2004, STOC '04.

[40]  Rina Panigrahy,et al.  Better streaming algorithms for clustering problems , 2003, STOC '03.

[41]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[42]  Hengjie Zhang,et al.  Improved Algorithms for Fully Dynamic Maximal Independent Set , 2018, ArXiv.

[43]  Anirban Dasgupta,et al.  Fast locality-sensitive hashing , 2011, KDD.

[44]  Dana Ron,et al.  Property testing and its connection to learning and approximation , 1998, JACM.

[45]  Christian Sohler,et al.  Fully dynamic hierarchical diameter k-clustering and k-center , 2019, ArXiv.

[46]  Tight Bounds for Adversarially Robust Streams and Sliding Windows via Difference Estimators , 2020, 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS).

[47]  George L. Nemhauser,et al.  Easy and hard bottleneck location problems , 1979, Discret. Appl. Math..

[48]  T.-H. Hubert Chan,et al.  Fully Dynamic k-Center Clustering , 2018, WWW.

[49]  Silvio Lattanzi,et al.  Consistent k-Clustering for General Metrics , 2020, SODA.

[50]  Ramgopal R. Mettu,et al.  Approximation algorithms for np -hard clustering problems , 2002 .

[51]  Soheil Behnezhad,et al.  Fully Dynamic Maximal Independent Set with Polylogarithmic Update Time , 2019, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS).