Frequency Estimation over Sliding Windows

Capturing characteristics of large data streams has received considerable attention. The constraints in space and time often restrict the data stream processing to only one pass. Furthermore, processing data streams over sliding windows makes the problem more difficult and challenging. In this paper, we address the problem of estimating epsi-approximate frequency in data streams over sliding windows. We are the first who propose an efficient algorithm which can achieve O(1/epsi) space requirement and only need O(1) running time to process each item in the data stream and to answer a query.

[1]  Yossi Matias,et al.  New sampling-based summary statistics for improving approximate query answers , 1998, SIGMOD '98.

[2]  Noga Alon,et al.  The Space Complexity of Approximating the Frequency Moments , 1999 .

[3]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[4]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[5]  Rajeev Motwani,et al.  Computing Iceberg Queries Efficiently , 1998, VLDB.

[6]  Hongjun Lu,et al.  Continuously maintaining quantile summaries of the most recent N elements over a data stream , 2004, Proceedings. 20th International Conference on Data Engineering.

[7]  Lap-Kei Lee,et al.  A simpler and more efficient deterministic scheme for finding frequent items over sliding windows , 2006, PODS '06.

[8]  Srikanta Tirthapura,et al.  Sketching asynchronous streams over a sliding window , 2006, PODC '06.

[9]  Erik D. Demaine,et al.  Frequency Estimation of Internet Packet Streams with Limited Space , 2002, ESA.

[10]  George Varghese,et al.  New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice , 2003, TOCS.

[11]  Gurmeet Singh Manku,et al.  Approximate counts and quantiles over sliding windows , 2004, PODS.

[12]  Yong Guan,et al.  Variance estimation over sliding windows , 2007, PODS '07.

[13]  Graham Cormode,et al.  What's hot and what's not: tracking most frequent items dynamically , 2003, TODS.

[14]  Rajeev Motwani,et al.  Maintaining variance and k-medians over data stream windows , 2003, PODS.

[15]  Richard M. Karp,et al.  A simple algorithm for finding frequent elements in streams and bags , 2003, TODS.

[16]  Srikanta Tirthapura,et al.  Distributed Streams Algorithms for Sliding Windows , 2004, Theory of Computing Systems.

[17]  Philippe Flajolet,et al.  Probabilistic Counting Algorithms for Data Base Applications , 1985, J. Comput. Syst. Sci..

[18]  Joan Feigenbaum,et al.  Computing Diameter in the Streaming and Sliding-Window Models , 2002, Algorithmica.

[19]  Kyu-Young Whang,et al.  A linear-time probabilistic counting algorithm for database applications , 1990, TODS.

[20]  Timothy M. Chan,et al.  Geometric Optimization Problems over Sliding Windows , 2006, Int. J. Comput. Geom. Appl..

[21]  Jayadev Misra,et al.  Finding Repeated Elements , 1982, Sci. Comput. Program..

[22]  Cristian Estan,et al.  New directions in traffic measurement and accounting , 2001, IMW '01.

[23]  Rajeev Motwani,et al.  Approximate Frequency Counts over Data Streams , 2012, VLDB.

[24]  Moses Charikar,et al.  Finding frequent items in data streams , 2004, Theor. Comput. Sci..

[25]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[26]  Piotr Indyk,et al.  Maintaining Stream Statistics over Sliding Windows , 2002, SIAM J. Comput..