The frequent items problem, under polynomial decay, in the streaming model

We consider the problem of estimating the frequency count of data stream elements under polynomial decay functions. In these settings every element arrives in the stream is assigned with a time decreasing weight, using a non increasing polynomial function. Decay functions are used in applications where older data is less significant \ interesting \ reliable than recent data. We propose 3 poly-logarithmic algorithms for the problem. The first one, deterministic, uses $ O (\frac{1}{\epsilon ^{2}} \log N ( \log \log N + \log U) ) $ bits. The second one, probabilistic, uses $O ( \frac{1}{\epsilon ^{2}} \log \frac{1}{\epsilon \delta} \log N )$ bits and the third one, deterministic in the stochastic model, uses $O(\frac{1}{\epsilon ^{2}} \log N)$ bits. In addition we show that using additional additive error can improve, in some cases, the space bounds. This variant of the problem is important and has many applications. To our knowledge it was never studied before.

[1]  Gurmeet Singh Manku,et al.  Approximate counts and quantiles over sliding windows , 2004, PODS.

[2]  Piotr Indyk,et al.  Maintaining stream statistics over sliding windows: (extended abstract) , 2002, SODA '02.

[3]  Sandy Irani,et al.  Cost-Aware WWW Proxy Caching Algorithms , 1997, USENIX Symposium on Internet Technologies and Systems.

[4]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[5]  Graham Cormode,et al.  Time-decaying sketches for sensor data aggregation , 2007, PODC '07.

[6]  Martin F. Arlitt,et al.  Evaluating content management techniques for Web proxy caches , 2000, PERV.

[7]  Edith Cohen,et al.  Maintaining time-decaying stream aggregates , 2006, J. Algorithms.

[8]  Graham Cormode,et al.  Time-decaying aggregates in out-of-order streams , 2008, PODS.

[9]  Tsvi Kopelowitz,et al.  Improved Algorithms for Polynomial-Time Decay and Time-Decay with Additive Error , 2007, Theory of Computing Systems.

[10]  Carey L. Williamson,et al.  Trace-Driven Simulation of Document Caching Strategies for Internet Web Servers , 1997, Simul..

[11]  Richard M. Karp,et al.  A simple algorithm for finding frequent elements in streams and bags , 2003, TODS.

[12]  Moses Charikar,et al.  Finding frequent items in data streams , 2004, Theor. Comput. Sci..

[13]  Erik D. Demaine,et al.  Frequency Estimation of Internet Packet Streams with Limited Space , 2002, ESA.

[14]  Jayadev Misra,et al.  Finding Repeated Elements , 1982, Sci. Comput. Program..

[15]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.