k-Center Clustering with Outliers in the Sliding-Window Model

The k-center problem for a point set P asks for a collection of k congruent balls (that is, balls of equal radius) that together cover all the points in P and whose radius is minimized. The k-center problem with outliers is defined similarly, except that z of the points in P do need not to be covered, for a given parameter z. We study the k-center problem with outliers in data streams in the sliding-window model. In this model we are given a possibly infinite stream P = ⟨ p₁,p₂,p₃,…⟩ of points and a time window of length W, and we want to maintain a small sketch of the set P(t) of points currently in the window such that using the sketch we can approximately solve the problem on P(t). We present the first algorithm for the k-center problem with outliers in the sliding-window model. The algorithm works for the case where the points come from a space of bounded doubling dimension and it maintains a set S(t) such that an optimal solution on S(t) gives a (1+e)-approximate solution on P(t). The algorithm uses O((kz/e^d)log σ) storage, where d is the doubling dimension of the underlying space and σ is the spread of the points in the stream. Algorithms providing a (1+e)-approximation were not even known in the setting without outliers or in the insertion-only setting with outliers. We also present a lower bound showing that any algorithm that provides a (1+e)-approximation must use Ω((kz/e)log σ) storage.

[1]  Asish Mukhopadhyay,et al.  Streaming 1-Center with Outliers in High Dimensions , 2009, CCCG.

[2]  Hamid Zarrabi-Zadeh,et al.  A Streaming Algorithm for 2-Center with Outliers in High Dimensions , 2015, CCCG.

[3]  Samir Khuller,et al.  Algorithms for facility location problems with outliers , 2001, SODA '01.

[4]  Samir Khuller,et al.  Streaming Algorithms for k-Center Clustering with Outliers and with Anonymity , 2008, APPROX-RANDOM.

[5]  Rina Panigrahy,et al.  Better streaming algorithms for clustering problems , 2003, STOC '03.

[6]  Timothy M. Chan,et al.  Geometric Optimization Problems over Sliding Windows , 2006, Int. J. Comput. Geom. Appl..

[7]  Hu Ding,et al.  Greedy Strategy Works for k-Center Clustering with Outliers and Coreset Construction , 2019, ESA.

[8]  Joan Feigenbaum,et al.  Computing Diameter in the Streaming and Sliding-Window Models , 2002, Algorithmica.

[9]  Rajeev Motwani,et al.  Incremental clustering and dynamic information retrieval , 1997, STOC '97.

[10]  Hamid Zarrabi-Zadeh,et al.  Core-Preserving Algorithms , 2008, CCCG.

[11]  Christian Sohler,et al.  Diameter and k-Center in Sliding Windows , 2016, ICALP.

[12]  Pankaj K. Agarwal,et al.  Robust Shape Fitting via Peeling and Grating Coresets , 2006, SODA '06.

[13]  Pankaj K. Agarwal,et al.  Streaming Algorithms for Extent Problems in High Dimensions , 2010, SODA '10.

[14]  Pankaj K. Agarwal,et al.  Streaming Algorithms for Extent Problems in High Dimensions , 2010, SODA.

[15]  Geppino Pucci,et al.  Solving k-center Clustering (with Outliers) in MapReduce and Streaming, almost as Accurately as Sequentially , 2018, Proc. VLDB Endow..

[16]  Timothy M. Chan,et al.  Streaming and dynamic algorithms for minimum enclosing balls in high dimensions , 2011, Comput. Geom..