Streaming algorithms for estimating entropy

We give a method for estimating the empirical Shannon entropy of a distribution in the streaming model of computation. Our approach reduces this problem to the well-studied problem of estimating frequency moments. The analysis of our approach is based on new results which establish quantitative bounds on the rate of convergence of Renyi entropy towards Shannon entropy.

[1]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[2]  Ping Li,et al.  Estimators and tail bounds for dimension reduction in lα (0 < α ≤ 2) using stable random projections , 2008, SODA '08.

[3]  David P. Woodruff,et al.  Optimal approximations of the frequency moments of data streams , 2005, STOC '05.

[4]  Ziv Bar-Yossef,et al.  An information statistics approach to data stream and communication complexity , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[5]  Sever S Dragomir On some inequalities for the rényi a-entropy , 2001 .

[6]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[7]  Sever S Dragomir,et al.  Further reverse results for Jensen's discrete inequality and applications in information theory. , 2000 .

[8]  Mark Crovella,et al.  Mining anomalies using traffic feature distributions , 2005, SIGCOMM '05.

[9]  Sudipto Guha,et al.  Streaming and sublinear approximation of entropy and information distances , 2005, SODA '06.

[10]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[11]  Graham Cormode,et al.  A near-optimal algorithm for computing the entropy of a stream , 2007, SODA '07.

[12]  Zhi-Li Zhang,et al.  Profiling internet backbone traffic: behavior models and applications , 2005, SIGCOMM '05.

[13]  Sumit Ganguly,et al.  Estimating Entropy over Data Streams , 2006, ESA.

[14]  Piotr Indyk,et al.  Stable distributions, pseudorandom generators, embeddings, and data stream computation , 2006, JACM.

[15]  Michael E. Saks,et al.  Space lower bounds for distance approximation in the data stream model , 2002, STOC '02.

[16]  W. Rudin Principles of mathematical analysis , 1964 .

[17]  Ashwin Lall,et al.  A data streaming algorithm for estimating entropies of od flows , 2007, IMC '07.

[18]  Karol Zyczkowski,et al.  Rényi Extrapolation of Shannon Entropy , 2003, Open Syst. Inf. Dyn..