The frequent items problem, under polynomial decay, in the streaming model

We consider the problem of estimating the frequency count of data stream elements under polynomial decay functions. In these settings every element in the stream is assigned with a time-decreasing weight, using a non-increasing polynomial function. Decay functions are used in applications where older data is less significant, less interesting or even less reliable than recent data. Consider a data stream of N elements drawn from a universe U. We propose three poly-logarithmic algorithms for the problem. The first one, deterministic, uses O([email protected]^2logN(loglogN+logU)) bits, where @[email protected]?(0,1) is the approximation parameter. The second one, probabilistic, uses O([email protected]^[email protected]@e) bits or O([email protected]^[email protected]) bits, depending on the decay function parameter, where @[email protected]?(0,1) is the probability of failure. The third one, deterministic in the stochastic model, uses O([email protected]) bits or O([email protected]^2logN) bits, also depending on the decay parameter as will be described in this paper. This variant of the problem is important and has many applications. To our knowledge, it has never been studied before.

[1]  Graham Cormode,et al.  Time-decaying sketches for sensor data aggregation , 2007, PODC '07.

[2]  Yong Guan,et al.  Variance estimation over sliding windows , 2007, PODS '07.

[3]  Erik D. Demaine,et al.  Identifying frequent items in sliding windows over on-line packet streams , 2003, IMC '03.

[4]  Ronald Fagin,et al.  Locally consistent transformations and query answering in data exchange , 2004, PODS '04.

[5]  Edith Cohen,et al.  Maintaining time-decaying stream aggregates , 2003, J. Algorithms.

[6]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[7]  Gurmeet Singh Manku,et al.  Approximate counts and quantiles over sliding windows , 2004, PODS.

[8]  Sandy Irani,et al.  Cost-Aware WWW Proxy Caching Algorithms , 1997, USENIX Symposium on Internet Technologies and Systems.

[9]  Moses Charikar,et al.  Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[10]  Graham Cormode,et al.  Time-decaying aggregates in out-of-order streams , 2008, PODS.

[11]  Carey L. Williamson,et al.  Trace-Driven Simulation of Document Caching Strategies for Internet Web Servers , 1997, Simul..

[12]  Jayadev Misra,et al.  Finding Repeated Elements , 1982, Sci. Comput. Program..

[13]  Tsvi Kopelowitz,et al.  Improved Algorithms for Polynomial-Time Decay and Time-Decay with Additive Error , 2007, Theory of Computing Systems.

[14]  Richard M. Karp,et al.  A simple algorithm for finding frequent elements in streams and bags , 2003, TODS.

[15]  Piotr Indyk,et al.  Maintaining stream statistics over sliding windows: (extended abstract) , 2002, SODA '02.

[16]  Peter Buneman,et al.  Semistructured data , 1997, PODS.

[17]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[18]  James Cheney,et al.  Curated databases , 2008, PODS.

[19]  Erik D. Demaine,et al.  Frequency Estimation of Internet Packet Streams with Limited Space , 2002, ESA.

[20]  Martin F. Arlitt,et al.  Evaluating content management techniques for Web proxy caches , 2000, PERV.

[21]  Prosenjit Bose,et al.  Bounds for Frequency Estimation of Packet Streams , 2003, SIROCCO.

[22]  Rajeev Motwani,et al.  Approximate Frequency Counts over Data Streams , 2012, VLDB.