Optimal space lower bounds for all frequency moments

We prove that any one-pass streaming algorithm which (ε, Δ)-approximates the <i>k</i>th frequency moment <i>F</i><inf><i>k</i></inf>, for any real <i>k</i> ≠ 1 and any ε = Ω(1/√m), must use Ω(1/ε²) bits of space, where <i>m</i> is the size of the universe. This is optimal in terms of ε, resolves the open questions of Bar-Yossef <i>et al</i> in [3, 4], and extends the Ω(1/ε²) lower bound for <i>F</i><inf>0</inf> in [11] to much smaller ε by applying novel techniques. Along the way we lower bound the one-way communication complexity of approximating the Hamming distance and the number of bipartite graphs with minimum/maximum degree constraints.

[1]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[2]  Andrew C. Yao,et al.  Lower bounds by probabilistic arguments , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[3]  Philippe Flajolet,et al.  Probabilistic Counting Algorithms for Data Base Applications , 1985, J. Comput. Syst. Sci..

[4]  I. Good C332. Surprise indexes and p-values , 1989 .

[5]  Noga Alon,et al.  The Probabilistic Method , 2015, Fundamentals of Ramsey Theory.

[6]  David J. DeWitt,et al.  Practical Skew Handling in Parallel Joins , 1992, VLDB.

[7]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[8]  Noam Nisan,et al.  On Randomized One-round Communication Complexity , 1995, STOC '95.

[9]  C. Papadimitriou,et al.  The complexity of massive data set computations , 2002 .

[10]  Ziv Bar-Yossef,et al.  An information statistics approach to data stream and communication complexity , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[11]  Luca Trevisan,et al.  Counting Distinct Elements in a Data Stream , 2002, RANDOM.

[12]  Brendan D. McKay,et al.  Asymptotic Enumeration of Graphs with a Given Upper Bound on the Maximum Degree , 2002, Combinatorics, Probability and Computing.

[13]  Ziv Bar-Yossef,et al.  Information theory methods in communication complexity , 2002, Proceedings 17th IEEE Annual Conference on Computational Complexity.

[14]  P. Indyk,et al.  Comparing Data Streams Using Hamming Norms (How to Zero In) , 2002, Very Large Data Bases Conference.

[15]  David P. Woodruff,et al.  Tight lower bounds for the distinct elements problem , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..