Stream Frequency Over Interval Queries

Stream frequency measurements are fundamental in many data stream applications such as financial data trackers, intrusion-detection systems, and network monitoring. Typically, recent data items are more relevant than old ones, a notion we can capture through a sliding window abstraction. This paper considers a generalized sliding window model that supports stream frequency queries over an interval given at query time. This enables drill-down queries, in which we can examine the behavior of the system in finer and finer granularities. For this model, we asymptotically improve the space bounds of existing work, reduce the update and query time to a constant, and provide deterministic solutions. When evaluated over real Internet packet traces, our fastest algorithm processes items 90--250 times faster, serves queries at least 730 times quicker and consumes at least 40% less space than the best known method.

[1]  Divyakant Agrawal,et al.  Efficient Computation of Frequent and Top-k Elements in Data Streams , 2005, ICDT.

[2]  Odysseas Papapetrou,et al.  Sketching distributed sliding-window data streams , 2015, The VLDB Journal.

[3]  Lap-Kei Lee,et al.  Finding frequent items over sliding windows with constant update time , 2010, Inf. Process. Lett..

[4]  Themis Palpanas,et al.  Identifying streaming frequent items in ad hoc time windows , 2013, Data Knowl. Eng..

[5]  Roy Friedman,et al.  Heavy hitters in streams and sliding windows , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[6]  Csaba D. Tóth,et al.  Space complexity of hierarchical heavy hitters in multi-dimensional data streams , 2005, PODS '05.

[7]  Erik D. Demaine,et al.  Frequency Estimation of Internet Packet Streams with Limited Space , 2002, ESA.

[8]  Aoying Zhou,et al.  Dynamically maintaining frequent items over a data stream , 2003, CIKM '03.

[9]  Jayadev Misra,et al.  Finding Repeated Elements , 1982, Sci. Comput. Program..

[10]  Roy Friedman,et al.  Optimal elephant flow detection , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.

[11]  Marios Hadjieleftheriou,et al.  Methods for finding frequent items in data streams , 2010, The VLDB Journal.

[12]  Themis Palpanas,et al.  Frequent items in streaming data: An experimental evaluation of the state-of-the-art , 2009, Data Knowl. Eng..

[13]  Richard M. Karp,et al.  A simple algorithm for finding frequent elements in streams and bags , 2003, TODS.

[14]  Piotr Indyk,et al.  Maintaining Stream Statistics over Sliding Windows , 2002, SIAM J. Comput..

[15]  Hongjun Lu,et al.  Continuously maintaining quantile summaries of the most recent N elements over a data stream , 2004, Proceedings. 20th International Conference on Data Engineering.

[16]  Lap-Kei Lee,et al.  A simpler and more efficient deterministic scheme for finding frequent items over sliding windows , 2006, PODS '06.

[17]  Roy Friedman,et al.  Volumetric Hierarchical Heavy Hitters , 2018, 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS).

[18]  Piotr Indyk,et al.  Space-optimal heavy hitters with strong error bounds , 2010, TODS.

[19]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[20]  Moses Charikar,et al.  Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[21]  Edo Liberty,et al.  A high-performance algorithm for identifying frequent items in data streams , 2017, Internet Measurement Conference.

[22]  Divesh Srivastava,et al.  Finding hierarchical heavy hitters in streaming data , 2008, TKDD.

[23]  George Varghese,et al.  New directions in traffic measurement and accounting , 2002, CCRV.

[24]  Roy Friedman,et al.  Fast Flow Volume Estimation , 2017, ICDCN.

[25]  Thomas Steinke,et al.  Hierarchical Heavy Hitters with the Space Saving Algorithm , 2011, ALENEX.

[26]  Ron Kohavi,et al.  Applications of Data Mining to Electronic Commerce , 2000, Springer US.

[27]  FriedmanRoy,et al.  Stream frequency over interval queries , 2018, VLDB 2018.

[28]  Ran Ben Basat Succinct Approximate Rank Queries , 2017 .

[29]  Gurmeet Singh Manku,et al.  Approximate counts and quantiles over sliding windows , 2004, PODS.

[30]  Roy Friedman,et al.  Constant Time Updates in Hierarchical Heavy Hitters , 2017, SIGCOMM.

[31]  Graham Cormode,et al.  Mergeable summaries , 2012, PODS '12.