The space complexity of approximating the frequency moments

The frequency moments of a sequence containing mi elements of type i, for 1 i n, are the numbers Fk = P n=1 m k . We consider the space complexity of randomized algorithms that approximate the numbers Fk, when the elements of the sequence are given one by one and cannot be stored. Surprisingly, it turns out that the numbers F0;F1 and F2 can be approximated in logarithmic space, whereas the approximation of Fk for k 6 requires n (1) space. Applications to data bases are mentioned as well.

[1]  Robert H. Morris,et al.  Counting large numbers of events in small registers , 1978, CACM.

[2]  Andrew Chi-Chih Yao,et al.  Some complexity questions related to distributive computing(Preliminary Report) , 1979, STOC.

[3]  Philippe Flajolet,et al.  Probabilistic counting , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[4]  Andrew C. Yao,et al.  Lower bounds by probabilistic arguments , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[5]  Philippe Flajolet,et al.  Approximate counting: A detailed analysis , 1985, BIT.

[6]  Noga Alon,et al.  A Fast and Simple Randomized Parallel Algorithm for the Maximal Independent Set Problem , 1985, J. Algorithms.

[7]  Peter Frankl,et al.  Complexity classes in communication complexity theory , 1986, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[8]  Bala Kalyanasundaram,et al.  The Probabilistic Communication Complexity of Set Intersection , 1992, SIAM J. Discret. Math..

[9]  I. Good C332. Surprise indexes and p-values , 1989 .

[10]  Kyu-Young Whang,et al.  A linear-time probabilistic counting algorithm for database applications , 1990, TODS.

[11]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[12]  Noga Alon,et al.  The Probabilistic Method , 2015, Fundamentals of Ramsey Theory.

[13]  David J. DeWitt,et al.  Practical Skew Handling in Parallel Joins , 1992, VLDB.

[14]  Alexander A. Razborov,et al.  On the Distributional Complexity of Disjointness , 1992, Theor. Comput. Sci..

[15]  Wei Sun,et al.  A supplement to sampling-based methods for query size estimation in a database system , 1992, SGMD.

[16]  Yannis E. Ioannidis,et al.  Balancing histogram optimality and practicality for query result size estimation , 1995, SIGMOD '95.

[17]  Jeffrey F. Naughton,et al.  Sampling-Based Estimation of the Number of Distinct Values of an Attribute , 1995, VLDB.