Generalizing the Layering Method of Indyk and Woodruff: Recursive Sketches for Frequency-Based Vectors on Streams

In their ground-breaking paper, Indyk and Woodruff (STOC 05) showed how to compute the k-th frequency moment F k (for k > 2) in space O(poly − log(n,m) · n\(^{1-{2} \over{k}})\), giving the first optimal result up to poly-logarithmic factors in n and m (here m is the length of the stream and n is the size of the domain.) The method of Indyk and Woodruff reduces the problem of F k to the problem of computing heavy hitters in the streaming manner. Their reduction only requires polylogarithmic overhead in term of the space complexity and is based on the fundamental idea of “layering.” Since 2005 the method of Indyk and Woodruff has been used in numerous applications and has become a standard tool for streaming computations.

[1]  José D. P. Rolim,et al.  Randomization and Approximation Techniques in Computer Science , 2002, Lecture Notes in Computer Science.

[2]  R. Ostrovsky,et al.  Smooth Histograms for Sliding Windows , 2007, FOCS 2007.

[3]  Luca Trevisan,et al.  Counting Distinct Elements in a Data Stream , 2002, RANDOM.

[4]  Ping Li,et al.  Compressed counting , 2008, SODA.

[5]  Sorin C. Popescu,et al.  Lidar Remote Sensing , 2011 .

[6]  Jessica H. Fong,et al.  An Approximate Lp Difference Algorithm for Massive Data Streams , 1999, Discret. Math. Theor. Comput. Sci..

[7]  Andrew McGregor,et al.  Estimating statistical aggregates on probabilistic data streams , 2007, PODS '07.

[8]  Philippe Flajolet,et al.  Probabilistic Counting Algorithms for Data Base Applications , 1985, J. Comput. Syst. Sci..

[9]  Sumit Ganguly,et al.  Estimating Frequency Moments of Data Streams Using Random Linear Combinations , 2004, APPROX-RANDOM.

[10]  R. Ostrovsky,et al.  Zero-one frequency laws , 2010, STOC '10.

[11]  Moses Charikar,et al.  Finding frequent items in data streams , 2004, Theor. Comput. Sci..

[12]  Sumit Ganguly,et al.  Simpler algorithm for estimating frequency moments of data streams , 2006, SODA '06.

[13]  David P. Woodruff,et al.  Rectangle-efficient aggregation in spatial data streams , 2012, PODS '12.

[14]  David P. Woodruff,et al.  Optimal approximations of the frequency moments of data streams , 2005, STOC '05.

[15]  Graham Cormode,et al.  Robust lower bounds for communication and stream computation , 2008, Theory Comput..

[16]  Atri Rudra,et al.  Lower bounds for randomized read/write stream algorithms , 2007, STOC '07.

[17]  Robin Milner,et al.  On Observing Nondeterminism and Concurrency , 1980, ICALP.

[18]  David P. Woodruff,et al.  An optimal algorithm for the distinct elements problem , 2010, PODS '10.

[19]  Graham Cormode,et al.  On Estimating Frequency Moments of Data Streams , 2007, APPROX-RANDOM.

[20]  Mahesh Viswanathan,et al.  An Approximate L1-Difference Algorithm for Massive Data Streams , 2002, SIAM J. Comput..

[21]  Noga Alon,et al.  The Space Complexity of Approximating the Frequency Moments , 1999 .

[22]  Ravi Kumar,et al.  An improved data stream algorithm for frequency moments , 2004, SODA '04.

[23]  David P. Woodruff,et al.  Fast Manhattan sketches in data streams , 2010, PODS '10.

[24]  Rafail Ostrovsky,et al.  Recursive Sketching For Frequency Moments , 2010, ArXiv.

[25]  Alexandr Andoni,et al.  Streaming Algorithms via Precision Sampling , 2010, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[26]  Rafail Ostrovsky,et al.  How to catch L2-heavy-hitters on sliding windows , 2014, Theor. Comput. Sci..

[27]  Ravi Kumar,et al.  An information statistics approach to data stream and communication complexity , 2004, J. Comput. Syst. Sci..

[28]  David P. Woodruff,et al.  On the exact space complexity of sketching and streaming small norms , 2010, SODA '10.

[29]  David P. Woodruff Optimal space lower bounds for all frequency moments , 2004, SODA '04.

[30]  Subhash Khot,et al.  Near-optimal lower bounds on the multi-party communication complexity of set disjointness , 2003, 18th IEEE Annual Conference on Computational Complexity, 2003. Proceedings..

[31]  Rafail Ostrovsky,et al.  Measuring independence of datasets , 2009, STOC '10.