An improved data stream algorithm for clustering

Abstract In the k-center problem for streaming points in d-dimensional metric space, input points are given in a data stream and the goal is to find the k smallest congruent balls whose union covers all input points by examining them. In the single-pass streaming model, input points are allowed to be examined only once and the amount of space that can be used to store relative information is limited. In this paper, we present a single-pass, ( 1.8 + e ) -factor, O ( d / e ) -space data stream algorithm for the Euclidean 2-center problem for any d ≥ 1 . This is the first result with an approximation factor below 2 using O ( d / e ) space for any d. Our algorithm naturally extends to the Euclidean k-center problem with k > 2 . We present a single-pass ( 1.8 + e ) -factor data stream algorithm for the Euclidean k-center problem for any d ≥ 1 , which uses O ( 2 k ( k + 3 ) ! d / e ) space and O ( 2 k ( k + 2 ) ! d / e ) update time.

[1]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[2]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[3]  I. A. Bonnell,et al.  The Formation of Stellar Clusters , 1999, 1709.08948.

[4]  Milan Sonka,et al.  Image Processing, Analysis and Machine Vision , 1993, Springer US.

[5]  Hyo-Sil Kim,et al.  Computing k-center over Streaming Data for Small k , 2014, ISAAC.

[6]  Charu C. Aggarwal,et al.  Data Streams - Models and Algorithms , 2014, Advances in Database Systems.

[7]  Nimrod Megiddo,et al.  On the Complexity of Some Common Geometric Location Problems , 1984, SIAM J. Comput..

[8]  Samir Khuller,et al.  Streaming Algorithms for k-Center Clustering with Outliers and with Anonymity , 2008, APPROX-RANDOM.

[9]  I. Bonnell,et al.  The hierarchical formation of a stellar cluster , 2003, astro-ph/0305082.

[10]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[11]  Binhai Zhu,et al.  Streaming with minimum space: An algorithm for covering by two congruent balls , 2012, Theor. Comput. Sci..

[12]  Rajeev Motwani,et al.  Incremental Clustering and Dynamic Information Retrieval , 2004, SIAM J. Comput..

[13]  Pankaj K. Agarwal,et al.  Streaming Algorithms for Extent Problems in High Dimensions , 2010, SODA '10.

[14]  Timothy M. Chan,et al.  Streaming and dynamic algorithms for minimum enclosing balls in high dimensions , 2011, Comput. Geom..

[15]  Sudipto Guha Tight results for clustering and summarizing data streams , 2009, ICDT '09.

[16]  Hamid Zarrabi-Zadeh,et al.  Core-Preserving Algorithms , 2008, CCCG.

[17]  Subhash Suri,et al.  Adaptive sampling for geometric problems over data streams , 2008, Comput. Geom..

[18]  Timothy M. Chan,et al.  A Simple Streaming Algorithm for Minimum Enclosing Balls , 2006, CCCG.