A Tight Lower Bound for High Frequency Moment Estimation with Small Error

We show an Ω((n 1 − 2/p logM)/e 2) bits of space lower bound for (1 + e)-approximating the p-th frequency moment \(F_p = \|x\|_p^p = \sum_{i=1}^n |x_i|^p\) of a vector x ∈ { − M, − M + 1, …, M} n with constant probability in the turnstile model for data streams, for any p > 2 and e ≥ 1/n 1/p (we require e ≥ 1/n 1/p since there is a trivial O(n logM) upper bound). This lower bound matches the space complexity of an upper bound of Ganguly for any e 2 and e ≥ 1/n 1/p . This is again optimal for e < 1/log O(1) n.

[1]  Sumit Ganguly,et al.  Lower Bounds on Frequency Estimation of Data Streams (Extended Abstract) , 2008, CSR.

[2]  Sumit Ganguly,et al.  Simpler algorithm for estimating frequency moments of data streams , 2006, SODA '06.

[3]  José D. P. Rolim,et al.  Randomization and Approximation Techniques in Computer Science , 2002, Lecture Notes in Computer Science.

[4]  David P. Woodruff,et al.  Optimal approximations of the frequency moments of data streams , 2005, STOC '05.

[5]  Eyal Kushilevitz,et al.  Communication Complexity: Index of Notation , 1996 .

[6]  David P. Woodruff,et al.  Fast moment estimation in data streams in optimal space , 2010, STOC '11.

[7]  Rafail Ostrovsky,et al.  Approximating Large Frequency Moments with Pick-and-Drop Sampling , 2012, APPROX-RANDOM.

[8]  Sumit Ganguly,et al.  Estimating Frequency Moments of Data Streams Using Random Linear Combinations , 2004, APPROX-RANDOM.

[9]  Binhai Zhu,et al.  Combinatorial Optimization and Applications , 2014, Lecture Notes in Computer Science.

[10]  Andrew Chi-Chih Yao,et al.  Informational complexity and the direct sum problem for simultaneous message complexity , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[11]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[12]  Alexandr Andoni,et al.  Tight Lower Bound for Linear Sketches of Moments , 2013, ICALP.

[13]  Philippe Flajolet,et al.  Probabilistic counting , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[14]  Graham Cormode,et al.  Space efficient mining of multigraph streams , 2005, PODS.

[15]  A. Dasgupta Asymptotic Theory of Statistics and Probability , 2008 .

[16]  David P. Woodruff,et al.  The Data Stream Space Complexity of Cascaded Norms , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[17]  Sumit Ganguly A Lower Bound for Estimating High Moments of a Data Stream , 2012, ArXiv.

[18]  T. S. Jayram Hellinger Strikes Back: A Note on the Multi-party Information Complexity of AND , 2009, APPROX-RANDOM.

[19]  Rafail Ostrovsky,et al.  Recursive Sketching For Frequency Moments , 2010, ArXiv.

[20]  Sumit Ganguly,et al.  Deterministically Estimating Data Stream Frequencies , 2009, COCOA.

[21]  S. Muthukrishnan,et al.  Estimating Entropy and Entropy Norm on Data Streams , 2006, Internet Math..

[22]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[23]  David P. Woodruff,et al.  Tight bounds for distributed functional monitoring , 2011, STOC '12.

[24]  Sorin C. Popescu,et al.  Lidar Remote Sensing , 2011 .

[25]  Ziv Bar-Yossef,et al.  An information statistics approach to data stream and communication complexity , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[26]  Moses Charikar,et al.  Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[27]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[28]  Graham Cormode,et al.  A near-optimal algorithm for estimating the entropy of a stream , 2010, TALG.

[29]  Alexandr Andoni,et al.  Streaming Algorithms via Precision Sampling , 2010, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[30]  David P. Woodruff,et al.  Applications of the Shannon-Hartley theorem to data streams and sparse recovery , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[31]  David P. Woodruff,et al.  Numerical linear algebra in the streaming model , 2009, STOC '09.

[32]  David P. Woodruff,et al.  Tight lower bounds for the distinct elements problem , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[33]  Subhash Khot,et al.  Near-optimal lower bounds on the multi-party communication complexity of set disjointness , 2003, 18th IEEE Annual Conference on Computational Complexity, 2003. Proceedings..

[34]  A. Razborov Communication Complexity , 2011 .

[35]  Robin Milner,et al.  On Observing Nondeterminism and Concurrency , 1980, ICALP.

[36]  Graham Cormode,et al.  On Estimating Frequency Moments of Data Streams , 2007, APPROX-RANDOM.

[37]  Luca Trevisan,et al.  Counting Distinct Elements in a Data Stream , 2002, RANDOM.

[38]  Srikanta Tirthapura,et al.  Range-Efficient Counting of Distinct Elements in a Massive Data Stream , 2007, SIAM J. Comput..

[39]  Noga Alon,et al.  The Space Complexity of Approximating the Frequency Moments , 1999 .

[40]  Ravi Kumar,et al.  An improved data stream algorithm for frequency moments , 2004, SODA '04.

[41]  David P. Woodruff,et al.  1-pass relative-error Lp-sampling with applications , 2010, SODA '10.

[42]  Sumit Ganguly,et al.  Polynomial Estimators for High Frequency Moments , 2011, ArXiv.

[43]  David P. Woodruff Optimal space lower bounds for all frequency moments , 2004, SODA '04.

[44]  C. Papadimitriou,et al.  The complexity of massive data set computations , 2002 .