Finding Heavy Hitters over the Sliding Window of a Weighted Data Stream

We study the problem of identifying items with heavy weights in the sliding window of a weighted data stream. We give a deterministic algorithm that solves the problem within error bound Ɛ, uses O(R/Ɛ) space and supports O(R/Ɛ) query and update times. Here, R is the maximum item weight. We also show that the space can be reduced substantially in practice by showing for any c > 0, we can construct an O(c log R/Ɛ2)-space algorithm, which returns correct answers provided that the ratio between the total weights of any two adjacent sliding windows is not greater than c. We also give a randomized algorithm that solves the problem with success probability 1 - δ using O(1/Ɛ2 log R log D log log D/δƐ) space where D is the number of distinct items in the data stream.

[1]  Srikanta Tirthapura,et al.  Distributed Streams Algorithms for Sliding Windows , 2002, SPAA '02.

[2]  Piotr Indyk,et al.  Maintaining Stream Statistics over Sliding Windows , 2002, SIAM J. Comput..

[3]  Mike Paterson,et al.  Combinatorics, Algorithms, Probabilistic and Experimental Methodologies, First International Symposium, ESCAPE 2007, Hangzhou, China, April 7-9, 2007, Revised Selected Papers , 2007, ESCAPE.

[4]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[5]  Jian Xu,et al.  Space Efficient Quantile Summary for Constrained Sliding Windows on a Data Stream , 2004, WAIM.

[6]  Gurmeet Singh Manku,et al.  Approximate counts and quantiles over sliding windows , 2004, PODS.

[7]  Jayadev Misra,et al.  Finding Repeated Elements , 1982, Sci. Comput. Program..

[8]  Hongjun Lu,et al.  Continuously maintaining quantile summaries of the most recent N elements over a data stream , 2004, Proceedings. 20th International Conference on Data Engineering.

[9]  Lap-Kei Lee,et al.  A simpler and more efficient deterministic scheme for finding frequent items over sliding windows , 2006, PODS '06.

[10]  Erik D. Demaine,et al.  Finding frequent items in sliding windows with multinomially-distributed item frequencies , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[11]  Sumit Ganguly,et al.  CR-precis: A Deterministic Summary Structure for Update Data Streams , 2006, ESCAPE.

[12]  Meng Li,et al.  Stream Operators for Querying Data Streams , 2005, WAIM.

[13]  Erik D. Demaine,et al.  Frequency Estimation of Internet Packet Streams with Limited Space , 2002, ESA.

[14]  Lap-Kei Lee,et al.  Maintaining significant stream statistics over sliding windows , 2006, SODA '06.

[15]  Piotr Indyk,et al.  Maintaining stream statistics over sliding windows: (extended abstract) , 2002, SODA '02.

[16]  Rajeev Motwani,et al.  Approximate Frequency Counts over Data Streams , 2012, VLDB.

[17]  Richard M. Karp,et al.  A simple algorithm for finding frequent elements in streams and bags , 2003, TODS.