Computing k Centers over Streaming Data for Small k

In this paper, we consider the k-center problem for streaming points in R d. More precisely, we consider the single-pass streaming model, where each point in the stream is allowed to be examined only once and a small amount of information can be stored in a device. Since the size of memory is much smaller than the size of the data in the streaming model, it is important to develop an algorithm whose space complexity does not depend on the number of input data. We present an approximation algorithm for k = 2 that guarantees a (2 + ε)-factor using O(d/ε) space and update time in arbitrary dimensions for any metric. We show that our algorithm can be extended to approximate an optimal k-center within factor (2 + ε) for k > 2.

[1]  Timothy M. Chan Faster core-set constructions and data stream algorithms in fixed dimensions , 2004, SCG '04.

[2]  Binhai Zhu,et al.  Streaming with Minimum Space: An Algorithm for Covering by Two Congruent Balls , 2012, COCOA.

[3]  Micha Sharir,et al.  The 2-center problem in three dimensions , 2010, Comput. Geom..

[4]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[5]  Nimrod Megiddo On the Complexity of Some Geometric Problems in Unbounded Dimension , 1990, J. Symb. Comput..

[6]  I. Bonnell,et al.  The hierarchical formation of a stellar cluster , 2003, astro-ph/0305082.

[7]  Rajeev Motwani,et al.  Incremental clustering and dynamic information retrieval , 1997, STOC '97.

[8]  Tomás Feder,et al.  Optimal algorithms for approximate clustering , 1988, STOC '88.

[9]  Dorit S. Hochbaum,et al.  Approximation Algorithms for NP-Hard Problems , 1996 .

[10]  Pankaj K. Agarwal,et al.  Exact and Approximation Algortihms for Clustering , 1997 .

[11]  Charu C. Aggarwal,et al.  Data Streams - Models and Algorithms , 2014, Advances in Database Systems.

[12]  Timothy M. Chan More planar two-center algorithms , 1999, Comput. Geom..

[13]  Hamid Zarrabi-Zadeh An Almost Space-Optimal Streaming Algorithm for Coresets in Fixed Dimensions , 2010, Algorithmica.

[14]  Teofilo F. GONZALEZ,et al.  Clustering to Minimize the Maximum Intercluster Distance , 1985, Theor. Comput. Sci..

[15]  Pankaj K. Agarwal,et al.  Approximating extent measures of points , 2004, JACM.

[16]  Samir Khuller,et al.  Streaming Algorithms for k-Center Clustering with Outliers and with Anonymity , 2008, APPROX-RANDOM.

[17]  Subhash Suri,et al.  Adaptive sampling for geometric problems over data streams , 2004, PODS.

[18]  Bernard Chazelle,et al.  On linear-time deterministic algorithms for optimization problems in fixed dimension , 1996, SODA '93.

[19]  Timothy M. Chan,et al.  A Simple Streaming Algorithm for Minimum Enclosing Balls , 2006, CCCG.

[20]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[21]  Milan Sonka,et al.  Image Processing, Analysis and Machine Vision , 1993, Springer US.

[22]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[23]  Nimrod Megiddo,et al.  On the Complexity of Some Common Geometric Location Problems , 1984, SIAM J. Comput..

[24]  Hamid Zarrabi-Zadeh,et al.  Core-Preserving Algorithms , 2008, CCCG.

[25]  Sudipto Guha Tight results for clustering and summarizing data streams , 2009, ICDT '09.

[26]  Pankaj K. Agarwal,et al.  Streaming Algorithms for Extent Problems in High Dimensions , 2010, SODA '10.

[27]  Timothy M. Chan,et al.  Streaming and dynamic algorithms for minimum enclosing balls in high dimensions , 2011, Comput. Geom..