Approximate voronoi cell computation on spatial data streams

Several studies have exploited the properties of Voronoi diagrams to improve the efficiency of variations of the nearest neighbor search on stored datasets. However, the significance of Voronoi diagrams and their basic building blocks, Voronoi cells, has been neglected when the geometry data is incrementally becoming available as a data stream. In this paper, we study the problem of Voronoi cell computation for fixed 2-d site points when the locations of the neighboring sites arrive as a spatial data stream. We show that the non-streaming solution to the problem does not meet the memory requirements of many realistic scenarios over a sliding window. Hence, we propose AVC-SW, an approximate streaming algorithm that computes (1 + ε)-approximations to the actual exact Voronoi cell in O(κ) where κ is its sample size. With the sliding window model and random arrival of points, we show both analytically and experimentally that for given window size w and parameter k, AVC-SW reduces the expected memory requirements of the classic algorithm from O(w) to $$O(k \log (\frac{w}{k} + 1))$$ regardless of the distribution of the points in the 2-d space. This is a significant improvement for most of the real-world scenarios where w ≫ k.

[1]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[2]  Atsuyuki Okabe,et al.  Spatial Tessellations: Concepts and Applications of Voronoi Diagrams , 1992, Wiley Series in Probability and Mathematical Statistics.

[3]  Sariel Har-Peled A replacement for Voronoi diagrams of near linear size , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[4]  Martin Gardner,et al.  Sixth book of mathematical games from Scientific American , 1971 .

[5]  Farnoush Banaei Kashani,et al.  SWAM: a family of access methods for similarity-search in peer-to-peer data networks , 2004, CIKM '04.

[6]  Joan Feigenbaum,et al.  Computing Diameter in the Streaming and Sliding-Window Models , 2002, Algorithmica.

[7]  Divyakant Agrawal,et al.  Discovery of Influence Sets in Frequently Updated Databases , 2001, VLDB.

[8]  Cyrus Shahabi,et al.  Voronoi-Based K Nearest Neighbor Search for Spatial Network Databases , 2004, VLDB.

[9]  Rajeev Motwani,et al.  Randomized Algorithms , 1995, SIGA.

[10]  Nisheeth Shrivastava,et al.  Cluster Hull: A Technique for Summarizing Spatial Data Streams , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[11]  Sunil Arya,et al.  Approximating a voronoi cell , 2003 .

[12]  Subhash Suri,et al.  Adaptive sampling for geometric problems over data streams , 2008, Comput. Geom..

[13]  Subhash Suri,et al.  Adaptive sampling for geometric problems over data streams , 2004, PODS.

[14]  Piotr Indyk,et al.  Streaming Algorithms for Geometric Problems , 2004, FSTTCS.

[15]  Piotr Indyk,et al.  Algorithms for dynamic geometric problems over data streams , 2004, STOC '04.

[16]  Mark de Berg,et al.  Computational geometry: algorithms and applications , 1997 .

[17]  Yufei Tao,et al.  Location-based spatial queries , 2003, SIGMOD '03.

[18]  Cecilia R. Aragon,et al.  Randomized search trees , 1989, 30th Annual Symposium on Foundations of Computer Science.

[19]  Hongjun Lu,et al.  Stabbing the sky: efficient skyline computation over sliding windows , 2005, 21st International Conference on Data Engineering (ICDE'05).

[20]  Cyrus Shahabi,et al.  Utilizing Voronoi Cells of Location Data Streams for Accurate Computation of Aggregate Functions in Sensor Networks , 2007 .

[21]  S. Muthukrishnan,et al.  Influence sets based on reverse nearest neighbor queries , 2000, SIGMOD '00.

[22]  Divyakant Agrawal,et al.  Reverse Nearest Neighbor Queries for Dynamic Databases , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[23]  Cyrus Shahabi,et al.  Supporting spatial aggregation in sensor network databases , 2004, GIS '04.

[24]  Sunil Arya,et al.  Space-efficient approximate Voronoi diagrams , 2002, STOC '02.

[25]  Michiel Hagedoorn Nearest Neighbors Can Be Found Efficiently If the Dimension Is Small Relative to the Input Size , 2003, ICDT.