CR-precis: A Deterministic Summary Structure for Update Data Streams

We present deterministic sub-linear space algorithms for problems over update data streams, including, estimating frequencies of items and ranges, finding approximate frequent items and approximate ϕ-quantiles, estimating inner-products, constructing near-optimal B-bucket histograms and estimating entropy. We also present improved lower bound results for several problems over update data streams.

[1]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[2]  Rajeev Rastogi,et al.  Processing Data-Stream Join Aggregates Using Skimmed Sketches , 2004, EDBT.

[3]  Erik D. Demaine,et al.  Frequency Estimation of Internet Packet Streams with Limited Space , 2002, ESA.

[4]  S. Muthukrishnan,et al.  Estimating Entropy and Entropy Norm on Data Streams , 2006, STACS.

[5]  S. Muthukrishnan,et al.  How to Summarize the Universe: Dynamic Maintenance of Quantiles , 2002, VLDB.

[6]  Sumit Ganguly,et al.  Estimating Entropy over Data Streams , 2006, ESA.

[7]  Prosenjit Bose,et al.  Bounds for Frequency Estimation of Packet Streams , 2003, SIROCCO.

[8]  Graham Cormode,et al.  What's new: finding significant differences in network data streams , 2004, IEEE/ACM Transactions on Networking.

[9]  Csaba D. Tóth,et al.  Space complexity of hierarchical heavy hitters in multi-dimensional data streams , 2005, PODS '05.

[10]  B. Rosser Explicit Bounds for Some Functions of Prime Numbers , 1941 .

[11]  Graham Cormode,et al.  Sketching Streams Through the Net: Distributed Approximate Query Tracking , 2005, VLDB.

[12]  Sumit Ganguly,et al.  Practical Algorithms for Tracking Database Join Sizes , 2005, FSTTCS.

[13]  Sanjeev Khanna,et al.  Space-efficient online computation of quantile summaries , 2001, SIGMOD '01.

[14]  Graham Cormode,et al.  A near-optimal algorithm for computing the entropy of a stream , 2007, SODA '07.

[15]  Sudipto Guha,et al.  Histogramming Data Streams with Fast Per-Item Processing , 2002, ICALP.

[16]  Richard M. Karp,et al.  A simple algorithm for finding frequent elements in streams and bags , 2003, TODS.

[17]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[18]  Jayadev Misra,et al.  Finding Repeated Elements , 1982, Sci. Comput. Program..

[19]  Sumit Ganguly,et al.  Deterministic k-set structure , 2006, PODS '06.

[20]  Graham Cormode,et al.  What's hot and what's not: tracking most frequent items dynamically , 2003, PODS '03.

[21]  Sudipto Guha,et al.  Fast, small-space algorithms for approximate histogram maintenance , 2002, STOC '02.

[22]  Sumit Ganguly,et al.  Counting distinct items over update streams , 2005, Theor. Comput. Sci..

[23]  Marios Hadjieleftheriou,et al.  Finding frequent items in data streams , 2008, Proc. VLDB Endow..

[24]  A. Gupta,et al.  Monitoring Flow-level High-speed Data Streams with Reversible Sketches , 2022 .

[25]  A. Kumar,et al.  Space-code bloom filter for efficient per-flow traffic measurement , 2004, IEEE INFOCOM 2004.

[26]  B GibbonsPhillip,et al.  New sampling-based summary statistics for improving approximate query answers , 1998 .

[27]  S. Muthukrishnan,et al.  Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries , 2001, VLDB.

[28]  G LindsayBruce,et al.  Random sampling techniques for space efficient online computation of order statistics of large datasets , 1999 .

[29]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[30]  Sudipto Guha,et al.  Streaming and sublinear approximation of entropy and information distances , 2005, SODA '06.

[31]  George Varghese,et al.  Automatically inferring patterns of resource consumption in network traffic , 2003, SIGCOMM '03.

[32]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[33]  Yossi Matias,et al.  New sampling-based summary statistics for improving approximate query answers , 1998, SIGMOD '98.

[34]  Bruce G. Lindsay,et al.  Random sampling techniques for space efficient online computation of order statistics of large datasets , 1999, SIGMOD '99.

[35]  Divesh Srivastava,et al.  Finding Hierarchical Heavy Hitters in Data Streams , 2003, VLDB.

[36]  Moses Charikar,et al.  Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[37]  George Varghese,et al.  Bitmap algorithms for counting active flows on high speed links , 2003, IMC '03.

[38]  Noga Alon,et al.  Tracking join and self-join sizes in limited storage , 1999, PODS '99.