The Space Complexity of Approximating the Frequency Moments

The frequency moments of a sequence containingmielements of typei, 1?i?n, are the numbersFk=?ni=1mki. We consider the space complexity of randomized algorithms that approximate the numbersFk, when the elements of the sequence are given one by one and cannot be stored. Surprisingly, it turns out that the numbersF0,F1, andF2can be approximated in logarithmic space, whereas the approximation ofFkfork?6 requiresn?(1)space. Applications to data bases are mentioned as well.

[1]  Robert H. Morris,et al.  Counting large numbers of events in small registers , 1978, CACM.

[2]  Andrew Chi-Chih Yao,et al.  Some complexity questions related to distributive computing(Preliminary Report) , 1979, STOC.

[3]  Philippe Flajolet,et al.  Probabilistic counting , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[4]  Andrew C. Yao,et al.  Lower bounds by probabilistic arguments , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[5]  Philippe Flajolet,et al.  Approximate counting: A detailed analysis , 1985, BIT.

[6]  Noga Alon,et al.  A Fast and Simple Randomized Parallel Algorithm for the Maximal Independent Set Problem , 1985, J. Algorithms.

[7]  Peter Frankl,et al.  Complexity classes in communication complexity theory , 1986, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[8]  Bala Kalyanasundaram,et al.  The Probabilistic Communication Complexity of Set Intersection , 1992, SIAM J. Discret. Math..

[9]  I. Good C332. Surprise indexes and p-values , 1989 .

[10]  Kyu-Young Whang,et al.  A linear-time probabilistic counting algorithm for database applications , 1990, TODS.

[11]  Alexander A. Razborov,et al.  On the Distributional Complexity of Disjointness , 1990, Theor. Comput. Sci..

[12]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[13]  Noga Alon,et al.  The Probabilistic Method , 2015, Fundamentals of Ramsey Theory.

[14]  David J. DeWitt,et al.  Practical Skew Handling in Parallel Joins , 1992, VLDB.

[15]  Wei Sun,et al.  A supplement to sampling-based methods for query size estimation in a database system , 1992, SGMD.

[16]  Inderpal Singh Mumick,et al.  The Stanford Data Warehousing Project , 1995 .

[17]  Yannis E. Ioannidis,et al.  Balancing histogram optimality and practicality for query result size estimation , 1995, SIGMOD '95.

[18]  Jeffrey F. Naughton,et al.  Sampling-Based Estimation of the Number of Distinct Values of an Attribute , 1995, VLDB.

[19]  Noga Alon,et al.  Tracking join and self-join sizes in limited storage , 1999, PODS '99.