Formal Limitations on the Measurement of Mutual Information

Measuring mutual information from finite data is difficult. Recent work has considered variational methods maximizing a lower bound. In this paper, we prove that serious statistical limitations are inherent to any method of measuring mutual information. More specifically, we show that any distribution-free high-confidence lower bound on mutual information estimated from N samples cannot be larger than O(ln N ).

[1]  S. Varadhan,et al.  Asymptotic evaluation of certain Markov process expectations for large time , 1975 .

[2]  Ralph Linsker,et al.  Self-organization in a perceptual network , 1988, Computer.

[3]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[4]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[5]  David A. McAllester,et al.  On the Convergence Rate of Good-Turing Estimators , 2000, COLT.

[6]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[7]  Alon Orlitsky,et al.  Always Good Turing: Asymptotically Optimal Probability Estimation , 2003, Science.

[8]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  Martin J. Wainwright,et al.  Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization , 2008, IEEE Transactions on Information Theory.

[10]  Gregory Valiant,et al.  Estimating the unseen: an n/log(n)-sample estimator for entropy and support size, shown optimal via new CLTs , 2011, STOC '11.

[11]  Charles O. Marsh Introduction to Continuous Entropy , 2013 .

[12]  David A. McAllester A PAC-Bayesian Tutorial with A Dropout Bound , 2013, ArXiv.

[13]  Yanjun Han,et al.  Minimax Estimation of Discrete Distributions under ℓ1 Loss , 2014, ArXiv.

[14]  Yanjun Han,et al.  Minimax Estimation of Functionals of Discrete Distributions , 2014, IEEE Transactions on Information Theory.

[15]  Aram Galstyan,et al.  Efficient Estimation of Mutual Information for Strongly Dependent Variables , 2014, AISTATS.

[16]  James M. Robins,et al.  Nonparametric von Mises Estimators for Entropies, Divergences and Mutual Informations , 2015, NIPS.

[17]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[18]  Alon Orlitsky,et al.  Competitive Distribution Estimation: Why is Good-Turing Good , 2015, NIPS.

[19]  Yanjun Han,et al.  Minimax Estimation of Discrete Distributions Under $\ell _{1}$ Loss , 2014, IEEE Transactions on Information Theory.

[20]  David A. McAllester,et al.  Who did What: A Large-Scale Person-Centered Cloze Dataset , 2016, EMNLP.

[21]  Gintare Karolina Dziugaite,et al.  Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data , 2017, UAI.

[22]  Aaron C. Courville,et al.  MINE: Mutual Information Neural Estimation , 2018, ArXiv.

[23]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[24]  David McAllester Information Theoretic Co-Training , 2018, ArXiv.

[25]  Yi Zhang,et al.  Do GANs learn the distribution? Some Theory and Empirics , 2018, ICLR.

[26]  Yoshua Bengio,et al.  Mutual Information Neural Estimation , 2018, ICML.

[27]  Alexander A. Alemi,et al.  On Variational Bounds of Mutual Information , 2019, ICML.

[28]  Karl Stratos,et al.  Mutual Information Maximization for Simple and Accurate Part-Of-Speech Induction , 2018, NAACL.

[29]  Yoshua Bengio,et al.  Learning deep representations by mutual information estimation and maximization , 2018, ICLR.