Estimating hybrid frequency moments of data streams

We consider the problem of estimating hybrid frequency moments of two dimensional data streams. In this model, data is viewed to be organized in a matrix form (Ai,j)1≤i,j,≤n. The entries Ai,j are updated coordinate-wise, in arbitrary order and possibly multiple times. The updates include both increments and decrements to the current value of Ai,j. The hybrid frequency moment Fp,q(A) is defined as $\sum_{j=1}^{n}(\sum_{i=1}^{n}{A_{i,j}}^{p})^{q}$ and is a generalization of the frequency moment of one-dimensional data streams.We present the first $\tilde{O}(1)$ space algorithm for the problem of estimating Fp,q for p∈[0,2] and q∈[0,1] to within an approximation factor of 1±ε. The $\tilde{O}$ notation hides poly-logarithmic factors in the size of the stream m, the matrix size n and polynomial factors of ε−1. We also present the first $\tilde{O}(n^{1-1/q})$ space algorithm for estimating Fp,q for p∈[0,2] and q∈(1,2].

[1]  Noga Alon,et al.  The Space Complexity of Approximating the Frequency Moments , 1999 .

[2]  Ping Li,et al.  Very Sparse Stable Random Projections, Estimators and Tail Bounds for Stable Random Projections , 2006, ArXiv.

[3]  Sumit Ganguly,et al.  Estimating Entropy over Data Streams , 2006, ESA.

[4]  Piotr Indyk,et al.  Stable distributions, pseudorandom generators, embeddings, and data stream computation , 2006, JACM.

[5]  S. Janson Stable distributions , 2011, 1112.0220.

[6]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[7]  Luca Trevisan,et al.  Counting Distinct Elements in a Data Stream , 2002, RANDOM.

[8]  Noam Nisan,et al.  Pseudorandom generators for space-bounded computations , 1990, STOC '90.

[9]  Sumit Ganguly Counting Distinct Items over Update Streams , 2005, ISAAC.

[10]  Sumit Ganguly,et al.  Simpler algorithm for estimating frequency moments of data streams , 2006, SODA '06.

[11]  Sumit Ganguly,et al.  Hierarchical Sampling from Sketches: Estimating Functions over Data Streams , 2009, Algorithmica.

[12]  David P. Woodruff,et al.  Optimal approximations of the frequency moments of data streams , 2005, STOC '05.

[13]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[14]  David P. Woodruff,et al.  Optimal Approximations of the Frequency Moments , 2004 .

[15]  T. S. Jayram,et al.  OPEN PROBLEMS IN DATA STREAMS AND RELATED TOPICS IITK WORKSHOP ON ALGORITHMS FOR DATA STREAMS ’06 , 2007 .

[16]  Philippe Flajolet,et al.  Probabilistic Counting Algorithms for Data Base Applications , 1985, J. Comput. Syst. Sci..

[17]  Moses Charikar,et al.  Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[18]  Graham Cormode,et al.  On Estimating Frequency Moments of Data Streams , 2007, APPROX-RANDOM.

[19]  David P. Woodruff Optimal space lower bounds for all frequency moments , 2004, SODA '04.

[20]  Ping Li,et al.  Estimators and tail bounds for dimension reduction in lα (0 < α ≤ 2) using stable random projections , 2008, SODA '08.

[21]  Subhash Khot,et al.  Near-optimal lower bounds on the multi-party communication complexity of set disjointness , 2003, 18th IEEE Annual Conference on Computational Complexity, 2003. Proceedings..

[22]  Sumit Ganguly,et al.  CR-precis: A Deterministic Summary Structure for Update Data Streams , 2006, ESCAPE.

[23]  Srikanta Tirthapura,et al.  Estimating simple functions on the union of data streams , 2001, SPAA '01.

[24]  Graham Cormode,et al.  Space efficient mining of multigraph streams , 2005, PODS.

[25]  David P. Woodruff,et al.  On the exact space complexity of sketching and streaming small norms , 2010, SODA '10.

[26]  Noam Nisan,et al.  Pseudorandom generators for space-bounded computation , 1992, Comb..

[27]  David P. Woodruff,et al.  The Data Stream Space Complexity of Cascaded Norms , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[28]  Ziv Bar-Yossef,et al.  An information statistics approach to data stream and communication complexity , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..