A simpler and more efficient deterministic scheme for finding frequent items over sliding windows

In this paper, we give a simple scheme for identifying ε-approximate frequent items over a sliding window of size <i>n</i>. Our scheme is deterministic and does not make any assumption on the distribution of the item frequencies. It supports <i>O</i>(1/ε) update and query time, and uses <i>O</i>(1/ε) space. It is very simple; its main data structures are just a few short queues whose entries store the position of some items in the sliding window. We also extend our scheme for variable-size window. This extended scheme uses <i>O</i>(1/ε log(ε<i>n</i>)) space.

[1]  Erik D. Demaine,et al.  Finding frequent items in sliding windows with multinomially-distributed item frequencies , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[2]  Moses Charikar,et al.  Finding frequent items in data streams , 2004, Theor. Comput. Sci..

[3]  Sudipto Guha,et al.  Data-streams and histograms , 2001, STOC '01.

[4]  Piotr Indyk,et al.  Maintaining Stream Statistics over Sliding Windows , 2002, SIAM J. Comput..

[5]  George Varghese,et al.  New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice , 2003, TOCS.

[6]  Lap-Kei Lee,et al.  Maintaining significant stream statistics over sliding windows , 2006, SODA '06.

[7]  Richard M. Karp,et al.  A simple algorithm for finding frequent elements in streams and bags , 2003, TODS.

[8]  Graham Cormode,et al.  What's hot and what's not: tracking most frequent items dynamically , 2003, TODS.

[9]  Aoying Zhou,et al.  Dynamically maintaining frequent items over a data stream , 2003, CIKM '03.

[10]  Erik D. Demaine,et al.  Frequency Estimation of Internet Packet Streams with Limited Space , 2002, ESA.

[11]  Erik D. Demaine,et al.  Identifying frequent items in sliding windows over on-line packet streams , 2003, IMC '03.

[12]  S. Muthukrishnan,et al.  Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries , 2001, VLDB.

[13]  Noga Alon,et al.  The Space Complexity of Approximating the Frequency Moments , 1999 .

[14]  Jayadev Misra,et al.  Finding Repeated Elements , 1982, Sci. Comput. Program..

[15]  Sudipto Guha,et al.  Clustering Data Streams , 2000, FOCS.

[16]  Yossi Matias,et al.  New sampling-based summary statistics for improving approximate query answers , 1998, SIGMOD '98.

[17]  Gurmeet Singh Manku,et al.  Approximate counts and quantiles over sliding windows , 2004, PODS.

[18]  Christopher Olston,et al.  Finding (recently) frequent items in distributed data streams , 2005, 21st International Conference on Data Engineering (ICDE'05).