Approximate counts and quantiles over sliding windows

We consider the problem of maintaining ε-approximate counts and quantiles over a stream sliding window using limited space. We consider two types of sliding windows depending on whether the number of elements N in the window is fixed (fixed-size sliding window) or variable (variable-size sliding window). In a fixed-size sliding window, both the ends of the window slide synchronously over the stream. In a variable-size sliding window, an adversary slides the window ends independently, and therefore has the ability to vary the number of elements N in the window.We present various deterministic and randomized algorithms for approximate counts and quantiles. All of our algorithms require O(1/ε polylog(1/ε, N)) space. For quantiles, this space requirement is an improvement over the previous best bound of O(1/ε2 polylog(1/ε, N)). We believe that no previous work on space-efficient approximate counts over sliding windows exists.

[1]  Yossi Matias,et al.  New sampling-based summary statistics for improving approximate query answers , 1998, SIGMOD '98.

[2]  Manuel Blum,et al.  Time Bounds for Selection , 1973, J. Comput. Syst. Sci..

[3]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[4]  Richard M. Karp,et al.  A simple algorithm for finding frequent elements in streams and bags , 2003, TODS.

[5]  Piotr Indyk,et al.  Maintaining Stream Statistics over Sliding Windows , 2002, SIAM J. Comput..

[6]  Ronald L. Rivest,et al.  Expected time bounds for selection , 1975, Commun. ACM.

[7]  Bruce G. Lindsay,et al.  Random sampling techniques for space efficient online computation of order statistics of large datasets , 1999, SIGMOD '99.

[8]  Srikanta Tirthapura,et al.  Distributed Streams Algorithms for Sliding Windows , 2002, SPAA '02.

[9]  Mike Paterson,et al.  Progress in Selection , 1996, SWAT.

[10]  Hannu Toivonen,et al.  Sampling Large Databases for Association Rules , 1996, VLDB.

[11]  Moses Charikar,et al.  Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[12]  Jayadev Misra,et al.  Finding Repeated Elements , 1982, Sci. Comput. Program..

[13]  Rajeev Motwani,et al.  Sampling from a moving window over streaming data , 2002, SODA '02.

[14]  Piotr Indyk,et al.  Maintaining stream statistics over sliding windows: (extended abstract) , 2002, SODA '02.

[15]  Graham Cormode,et al.  What's hot and what's not: tracking most frequent items dynamically , 2003, PODS '03.

[16]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[17]  Hongjun Lu,et al.  Continuously maintaining quantile summaries of the most recent N elements over a data stream , 2004, Proceedings. 20th International Conference on Data Engineering.

[18]  David J. DeWitt,et al.  Parallel sorting on a shared-nothing architecture using probabilistic splitting , 1991, [1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems.

[19]  Rajeev Motwani,et al.  Maintaining variance and k-medians over data stream windows , 2003, PODS.

[20]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[21]  Bruce G. Lindsay,et al.  Approximate medians and other quantiles in one pass and with limited memory , 1998, SIGMOD '98.

[22]  J. Ian Munro,et al.  Selection and sorting with limited storage , 1978, 19th Annual Symposium on Foundations of Computer Science (sfcs 1978).

[23]  Erik D. Demaine,et al.  Frequency Estimation of Internet Packet Streams with Limited Space , 2002, ESA.

[24]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[25]  Sanjeev Khanna,et al.  Space-efficient online computation of quantile summaries , 2001, SIGMOD '01.