Adaptive sampling for geometric problems over data streams

Geometric coordinates are an integral part of many data streams. Examples include sensor locations in environmental monitoring, vehicle locations in traffic monitoring or battlefield simulations, scientific measurements of earth or atmospheric phenomena, etc. This paper focuses on the problem of summarizing such geometric data streams using limited storage so that many natural geometric queries can be answered faithfully. Some examples of such queries are: report the smallest convex region in which a chemical leak has been sensed, or track the diameter of the dataset, or track the extent of the dataset in any given direction. One can also pose queries over multiple streams: for instance, track the minimum distance between the convex hulls of two data streams, report when datasets A and B are no longer linearly separable, or report when points of data stream A become completely surrounded by points of data stream B, etc. These queries are easily extended to more than two streams. In this paper, we propose an adaptive sampling scheme that gives provably optimal error bounds for extremal problems of this nature. All our results follow from a single technique for computing the approximate convex hull of a point stream in a single pass. Our main result is this: given a stream of two-dimensional points and an integer r, we can maintain an adaptive sample of at most 2r+1 points such that the distance between the true convex hull and the convex hull of the sample points is O(D/r^2), where D is the diameter of the sample set. The amortized time for processing each point in the stream is O(logr). Using the sample convex hull, all the queries mentioned above can be answered approximately in either O(logr) or O(r) time.

[1]  Peter Z. Kunszt,et al.  The SDSS skyserver: public access to the sloan digital sky server data , 2001, SIGMOD '02.

[2]  Peter van Emde Boas,et al.  Preserving Order in a Forest in Less Than Logarithmic Time and Linear Space , 1977, Inf. Process. Lett..

[3]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[4]  Sanjeev Khanna,et al.  Space-efficient online computation of quantile summaries , 2001, SIGMOD '01.

[5]  P. Gruber Approximation of convex bodies , 1983 .

[6]  R. Dudley Metric Entropy of Some Classes of Sets with Differentiable Boundaries , 1974 .

[7]  Jörg M. Wills,et al.  Convexity and its applications , 1983 .

[8]  William Pugh,et al.  Skip lists: a probabilistic alternative to balanced trees , 1989, CACM.

[9]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[10]  Pankaj K. Agarwal,et al.  Approximating extent measures of points , 2004, JACM.

[11]  Mark de Berg,et al.  Computational geometry: algorithms and applications , 1997 .

[12]  Peter van Emde Boas,et al.  Design and implementation of an efficient priority queue , 1976, Mathematical systems theory.

[13]  Bruce G. Lindsay,et al.  Random sampling techniques for space efficient online computation of order statistics of large datasets , 1999, SIGMOD '99.

[14]  Petar S. Kenderov,et al.  Polygonal approximation of plane convex compacta , 1983 .

[15]  R. A. Vitale,et al.  Polygonal approxi-mation of plane convex bodies , 1975 .

[16]  T. J. Richardson,et al.  Approximation of Planar Convex Sets from Hyperplane Probes , 1997, Discret. Comput. Geom..

[17]  Timothy M. Chan Faster core-set constructions and data-stream algorithms in fixed dimensions , 2006, Comput. Geom..

[18]  Nisheeth Shrivastava,et al.  Summarizing spatial data streams using ClusterHulls , 2006, JEAL.

[19]  Jeffrey Scott Vitter,et al.  Approximate data structures with applications , 1994, SODA '94.

[20]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[21]  Noga Alon,et al.  The Space Complexity of Approximating the Frequency Moments , 1999 .

[22]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[23]  Jessica H. Fong,et al.  An Approximate Lp Difference Algorithm for Massive Data Streams , 1999, Discret. Math. Theor. Comput. Sci..

[24]  Joan Feigenbaum,et al.  Computing Diameter in the Streaming and Sliding-Window Models , 2002, Algorithmica.