Dimensionality-adaptive k-center in sliding windows

In this paper we present a novel streaming algorithm for the k-center clustering problem for general metric spaces under the sliding window model. The algorithm maintains a small coreset which, at any time, allows to compute a solution to the k-center problem on the current window with an approximation quality that can be made arbitrarily close to the best approximation attainable by a sequential algorithm running on the entire window. Remarkably, the size of our coreset is independent of the window size and can be upper bounded by a function of k, of the desired accuracy, and of the doubling dimension of the metric space induced by the stream. For streams of bounded doubling dimension, the coreset size is merely linear in k. One of the major strengths of our algorithm is that it is fully oblivious to the doubling dimension of the stream, and it adapts to the characteristics of each individual window. Also, unlike previous works, the algorithm can be made oblivious to the aspect ratio of the metric space, a parameter related to the spread of distances. We also provide experimental evidence of the practical viability of the approach and its superiority over the current state of the art.

[1]  Andréa W. Richa,et al.  Dynamic routing and location services in metrics of low doubling dimension , 2008, PODC '08.

[2]  Maria-Florina Balcan,et al.  Center Based Clustering: A Foundational Perspective , 2014 .

[3]  Lee-Ad Gottlieb,et al.  Efficient Classification for Metric Data , 2014, IEEE Trans. Inf. Theory.

[4]  Prabhakar Raghavan,et al.  Computing on data streams , 1999, External Memory Algorithms.

[5]  T.-H. Hubert Chan,et al.  Fully Dynamic k-Center Clustering , 2018, WWW.

[6]  Samir Khuller,et al.  Streaming Algorithms for k-Center Clustering with Outliers and with Anonymity , 2008, APPROX-RANDOM.

[7]  Rajeev Motwani,et al.  The Sliding-Window Computation Model and Results , 2007, Data Stream Management.

[8]  Sudipto Guha Tight results for clustering and summarizing data streams , 2009, ICDT '09.

[9]  Fionn Murtagh,et al.  Handbook of Cluster Analysis , 2015 .

[10]  Richard Cole,et al.  Searching dynamic point sets in spaces with bounded doubling dimension , 2006, STOC '06.

[11]  Marcel R. Ackermann,et al.  Clustering for metric and non-metric distance measures , 2008, SODA '08.

[12]  Robert Krauthgamer,et al.  Bounded geometries, fractals, and low-distortion embeddings , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[13]  Geppino Pucci,et al.  Solving k-center Clustering (with Outliers) in MapReduce and Streaming, almost as Accurately as Sequentially , 2018, Proc. VLDB Endow..

[14]  Christian Sohler,et al.  Diameter and k-Center in Sliding Windows , 2016, ICALP.

[15]  Teofilo F. GONZALEZ,et al.  Clustering to Minimize the Maximum Intercluster Distance , 1985, Theor. Comput. Sci..

[16]  Dariusz Leniowski,et al.  Fully Dynamic k-Center Clustering in Doubling Metrics , 2019, ArXiv.

[17]  Sang-Sub Kim Computing Euclidean k-Center over Sliding Windows , 2020, ArXiv.