Dimension-Free Empirical Entropy Estimation

We seek an entropy estimator for discrete distributions with fully empirical accuracy bounds. As stated, this goal is infeasible without some prior assumptions on the distribution. We discover that a certain information moment assumption renders the problem feasible. We argue that the moment assumption is natural and, in some sense, minimalistic — weaker than finite support or tail decay conditions. Under the moment assumption, we provide the first finite-sample entropy estimates for infinite alphabets, nearly recovering the known minimax rates. Moreover, we demonstrate that our empirical bounds are significantly sharper than the state-of-the-art bounds, for various natural distributions and non-trivial sample regimes. Along the way, we give a dimension-free analogue of the Cover-Thomas result on entropy continuity (with respect to total variation distance) for finite alphabets, which may be of independent interest.

[1]  H. Scheffé A Useful Convergence Theorem for Probability Distributions , 1947 .

[2]  Yanjun Han,et al.  Maximum Likelihood Estimation of Functionals of Discrete Distributions , 2014, IEEE Transactions on Information Theory.

[3]  David Brink,et al.  A (probably) exact solution to the Birthday Problem , 2012 .

[4]  Igal Sason,et al.  Entropy Bounds for Discrete Random Variables via Maximal Coupling , 2012, IEEE Transactions on Information Theory.

[5]  D. Berend,et al.  A sharp estimate of the binomial mean absolute deviation with applications , 2013 .

[6]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[7]  Yihong Wu,et al.  Minimax Rates of Entropy Estimation on Large Alphabets via Best Polynomial Approximation , 2014, IEEE Transactions on Information Theory.

[8]  Sergio Verdú,et al.  Empirical Estimation of Information Measures: A Literature Guide , 2019, Entropy.

[9]  Gregory Valiant,et al.  Estimating the unseen: an n/log(n)-sample estimator for entropy and support size, shown optimal via new CLTs , 2011, STOC '11.

[10]  Helmut Jürgensen,et al.  Entropy and Higher Moments of Information , 2010, J. Univers. Comput. Sci..

[11]  Ioannis Kontoyiannis,et al.  Estimating the entropy of discrete distributions , 2001, Proceedings. 2001 IEEE International Symposium on Information Theory (IEEE Cat. No.01CH37252).

[12]  Daniel Berend,et al.  The Expected Missing Mass under an Entropy Constraint , 2017, Entropy.

[13]  Norbert Kusolitsch Why the theorem of Scheffé should be rather called a theorem of Riesz , 2010, Period. Math. Hung..

[14]  Piotr Indyk,et al.  Estimating Entropy of Distributions in Constant Space , 2019, NeurIPS.

[15]  Jun Sakuma,et al.  Minimax Optimal Additive Functional Estimation with Discrete Distribution: Slow Divergence Speed Case , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[16]  A. Antos,et al.  Convergence properties of functional estimates for discrete distributions , 2001 .

[17]  Zhengmin Zhang,et al.  Estimating Mutual Information Via Kolmogorov Distance , 2007, IEEE Transactions on Information Theory.

[18]  Raymond W. Yeung,et al.  The Interplay Between Entropy and Variational Distance , 2007, IEEE Transactions on Information Theory.

[19]  Gregory Valiant,et al.  The Power of Linear Estimators , 2011, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[20]  Shahar Mendelson,et al.  Mean Estimation and Regression Under Heavy-Tailed Distributions: A Survey , 2019, Found. Comput. Math..

[21]  S. Golomb,et al.  The information generating function of a probability distribution (Corresp.) , 1966, IEEE Trans. Inf. Theory.

[22]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[23]  Alon Orlitsky,et al.  Data Amplification: Instance-Optimal Property Estimation , 2019, ICML.

[24]  Alex Samorodnitsky,et al.  Approximating entropy from sublinear samples , 2007, SODA '07.

[25]  Alon Orlitsky,et al.  A Unified Maximum Likelihood Approach for Estimating Symmetric Properties of Discrete Distributions , 2017, ICML.

[26]  Alon Orlitsky,et al.  Data Amplification: A Unified and Competitive Approach to Property Estimation , 2019, NeurIPS.

[27]  Paul Valiant,et al.  Estimating the Unseen , 2013, NIPS.

[28]  Yanjun Han,et al.  Adaptive estimation of Shannon entropy , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[29]  P. Massart The Tight Constant in the Dvoretzky-Kiefer-Wolfowitz Inequality , 1990 .

[30]  K. Audenaert A sharp continuity estimate for the von Neumann entropy , 2006, quant-ph/0610146.

[31]  Liam Paninski,et al.  Estimating entropy on m bins given fewer than m samples , 2004, IEEE Transactions on Information Theory.

[32]  Jorge F. Silva,et al.  Shannon Entropy Estimation in ∞-Alphabets from Convergence Results: Studying Plug-In Estimators , 2017, Entropy.

[33]  Yanjun Han,et al.  Minimax Estimation of Functionals of Discrete Distributions , 2014, IEEE Transactions on Information Theory.

[34]  Jun Sakuma,et al.  Minimax optimal estimators for additive scalar functionals of discrete distributions , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[35]  Liam Paninski,et al.  Estimation of Entropy and Mutual Information , 2003, Neural Computation.