Optimal Approximations of the Frequency Moments

We give a 1-pass O(m1−2/k)-space algorithm for computing the k-th frequency moment of a data stream for any real k > 2. Together with the lower bounds of [1, 2, 3], this resolves the main problem left open by Alon et al in 1996 [1]. Our algorithm additionally works for streams with deletions and thus gives an O(m1−2/p) space algorithm for the Lp difference problem for any p > 2. This essentially matches the known Ω(m1−2/p−o(1)) lower bound of [10, 2]. Finally the update time of our algorithms is O(1).

[1]  Ravi Kumar,et al.  An improved data stream algorithm for frequency moments , 2004, SODA '04.

[2]  Mikkel Thorup,et al.  Tabulation based 4-universal hashing with applications to second moment estimation , 2004, SODA '04.

[3]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[4]  Subhash Khot,et al.  Near-optimal lower bounds on the multi-party communication complexity of set disjointness , 2003, 18th IEEE Annual Conference on Computational Complexity, 2003. Proceedings..

[5]  Ziv Bar-Yossef,et al.  An information statistics approach to data stream and communication complexity , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[6]  Moses Charikar,et al.  Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[7]  Michael E. Saks,et al.  Space lower bounds for distance approximation in the data stream model , 2002, STOC '02.

[8]  Mahesh Viswanathan,et al.  An Approximate L1-Difference Algorithm for Massive Data Streams , 2002, SIAM J. Comput..

[9]  P. Indyk Stable distributions, pseudorandom generators, embeddings and data stream computation , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[10]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.