Fully Dynamic Consistent Facility Location

We consider classic clustering problems in fully dynamic data streams, where data elements can be both inserted and deleted. In this context, several parameters are of importance: (1) the quality of the solution after each insertion or deletion, (2) the time it takes to update the solution, and (3) how different consecutive solutions are. The question of obtaining efficient algorithms in this context for facility location, $k$-median and $k$-means has been raised in a recent paper by Hubert-Chan et al. [WWW'18] and also appears as a natural follow-up on the online model with recourse studied by Lattanzi and Vassilvitskii [ICML'17] (i.e.: in insertion-only streams). In this paper, we focus on general metric spaces and mainly on the facility location problem. We give an arguably simple algorithm that maintains a constant factor approximation, with $O(n\log n)$ update time, and total recourse $O(n)$. This improves over the naive algorithm which consists in recomputing a solution at each time step and that can take up to $O(n^2)$ update time, and $O(n^2)$ total recourse. These bounds are nearly optimal: in general metric space, inserting a point take $O(n)$ times to describe the distances to other points, and we give a simple lower bound of $O(n)$ for the recourse. Moreover, we generalize this result for the $k$-medians and $k$-means problems: our algorithm maintains a constant factor approximation in time $\widetilde{O}(n+k^2)$. We complement our analysis with experiments showing that the cost of the solution maintained by our algorithm at any time $t$ is very close to the cost of a solution obtained by quickly recomputing a solution from scratch at time $t$ while having a much better running time.

[1]  Ola Svensson,et al.  Better Guarantees for k-Means and Euclidean k-Median by Primal-Dual Algorithms , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[2]  Vijay V. Vazirani,et al.  Approximation algorithms for metric facility location and k-Median problems using the primal-dual schema and Lagrangian relaxation , 2001, JACM.

[3]  Teofilo F. GONZALEZ,et al.  Clustering to Minimize the Maximum Intercluster Distance , 1985, Theor. Comput. Sci..

[4]  Rina Panigrahy,et al.  Better streaming algorithms for clustering problems , 2003, STOC '03.

[5]  Andreas Krause,et al.  Distributed and Provably Good Seedings for k-Means in Constant Rounds , 2017, ICML.

[6]  Piotr Sankowski,et al.  Online Facility Location with Deletions , 2018, ESA.

[7]  Christian Sohler,et al.  Facility Location in Dynamic Geometric Data Streams , 2008, ESA.

[8]  C. Greg Plaxton,et al.  Optimal Time Bounds for Approximate Clustering , 2002, Machine Learning.

[9]  Christian Sohler,et al.  Coresets in dynamic geometric data streams , 2005, STOC '05.

[10]  Russell Bent,et al.  A simple and deterministic competitive algorithm for online facility location , 2004, Inf. Comput..

[11]  Claire Mathieu,et al.  Dynamic Clustering to Minimize the Sum of Radii , 2017, Algorithmica.

[12]  C. Greg Plaxton,et al.  The Online Median Problem , 1999, SIAM J. Comput..

[13]  Kamesh Munagala,et al.  Local Search Heuristics for k-Median and Facility Location Problems , 2004, SIAM J. Comput..

[14]  David M. Mount,et al.  A local search approximation algorithm for k-means clustering , 2002, SCG '02.

[15]  Sudipto Guha,et al.  Improved combinatorial algorithms for the facility location and k-median problems , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[16]  T.-H. Hubert Chan,et al.  Fully Dynamic k-Center Clustering , 2018, WWW.

[17]  Alexander Munteanu,et al.  Coresets-Methods and History: A Theoreticians Design Pattern for Approximation and Streaming Algorithms , 2017, KI - Künstliche Intelligenz.

[18]  Shi Li,et al.  A 1.488 approximation algorithm for the uncapacitated facility location problem , 2011, Inf. Comput..

[19]  Harry Lang Online Facility Location against a t-Bounded Adversary , 2018, SODA.

[20]  Rafail Ostrovsky,et al.  Streaming k-means on well-clusterable data , 2011, SODA '11.

[21]  Vladimir Braverman,et al.  Clustering Problems on Sliding Windows , 2016, SODA.

[22]  Artur Czumaj,et al.  (1+ Є)-approximation for facility location in data streams , 2013, SODA.

[23]  Rajeev Motwani,et al.  Incremental clustering and dynamic information retrieval , 1997, STOC '97.

[24]  Dimitris Fotakis Incremental algorithms for Facility Location and k-Median , 2006, Theor. Comput. Sci..

[25]  Dariusz Leniowski,et al.  A Tree Structure For Dynamic Facility Location , 2019, ESA.

[26]  Anupam Gupta,et al.  Simpler Analyses of Local Search Algorithms for Facility Location , 2008, ArXiv.

[27]  Samir Khuller,et al.  Greedy strikes back: improved facility location algorithms , 1998, SODA '98.

[28]  Dimitris Fotakis A Primal-Dual Algorithm for Online Non-uniform Facility Location , 2005, Panhellenic Conference on Informatics.

[29]  Adam Meyerson,et al.  Fast and Accurate k-means For Large Datasets , 2011, NIPS.

[30]  Silvio Lattanzi,et al.  Consistent k-Clustering , 2017, ICML.

[31]  Adam Meyerson,et al.  Online facility location , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[32]  Dimitris Fotakis,et al.  On the Competitive Ratio for Online Facility Location , 2003, Algorithmica.

[33]  Vladimir Braverman,et al.  Clustering High Dimensional Dynamic Data Streams , 2017, ICML.

[34]  Piotr Indyk,et al.  Algorithms for dynamic geometric problems over data streams , 2004, STOC '04.