Estimating Functions of Distributions Defined over Spaces of Unknown Size

We consider Bayesian estimation of information-theoretic quantities from data, using a Dirichlet prior. Acknowledging the uncertainty of the event space size m and the Dirichlet prior’s concentration parameter c, we treat both as random variables set by a hyperprior. We show that the associated hyperprior, P(c, m), obeys a simple “Irrelevance of Unseen Variables” (IUV) desideratum iff P(c, m) = P(c)P(m). Thus, requiring IUV greatly reduces the number of degrees of freedom of the hyperprior. Some information-theoretic quantities can be expressed multiple ways, in terms of different event spaces, e.g., mutual information. With all hyperpriors (implicitly) used in earlier work, different choices of this event space lead to different posterior expected values of these information-theoretic quantities. We show that there is no such dependence on the choice of event space for a hyperprior that obeys IUV. We also derive a result that allows us to exploit IUV to greatly simplify calculations, like the posterior expected mutual information or posterior expected multi-information. We also use computer experiments to favorably compare an IUV-based estimator of entropy to three alternative methods in common use. We end by discussing how seemingly innocuous changes to the formalization of an estimation problem can substantially affect the resultant estimates of posterior expectations.

[1]  William Bialek,et al.  Entropy and Inference, Revisited , 2001, NIPS.

[2]  D. Wolpert RECONCILING BAYESIAN AND NON-BAYESIAN ANALYSIS , 1996 .

[3]  M. Tribus,et al.  Probability theory: the logic of science , 2003 .

[4]  Bin Yu,et al.  Coverage-adjusted entropy estimation. , 2007, Statistics in medicine.

[5]  Liam Paninski,et al.  Estimation of Entropy and Mutual Information , 2003, Neural Computation.

[6]  Jonathan W. Pillow,et al.  Bayesian estimation of discrete entropy with mixtures of stick-breaking priors , 2012, NIPS.

[7]  William Bialek,et al.  Entropy and information in neural spike trains: progress on the sampling problem. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[8]  Wolpert,et al.  Erratum: Estimating functions of probability distributions from a finite set of samples , 1996, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[9]  A. Lapedes,et al.  Covariation of mutations in the V3 loop of human immunodeficiency virus type 1 envelope protein: an information theoretic analysis. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[10]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[11]  J. Bunge,et al.  Estimating the Number of Species: A Review , 1993 .

[12]  James P. Crutchfield,et al.  Anatomy of a Bit: Information in a Time Series Observation , 2011, Chaos.

[13]  David H. Wolpert,et al.  Estimating Functions of Distributions from A Finite Set of Samples, Part 2: Bayes Estimators for Mutual Information, Chi-Squared, Covariance and other Statistics , 1994, comp-gas/9403002.

[14]  Marcus Hutter,et al.  Distribution of Mutual Information , 2001, NIPS.

[15]  William Bialek,et al.  Neural Coding of Natural Stimuli: Information at Sub-Millisecond Resolution , 2007, BMC Neuroscience.

[16]  P. Grassberger Entropy Estimates from Insufficient Samplings , 2003, physics/0307138.

[17]  David H. Wolpert,et al.  The Relationship Between PAC, the Statistical Physics Framework, the Bayesian Framework, and the VC Framework , 1995 .

[18]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[19]  David R. Wolf,et al.  Estimating functions of probability distributions from a finite set of samples. , 1994, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[20]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[21]  Ilya Nemenman,et al.  Coincidences and Estimation of Entropies of Random Variables with Large Cardinalities , 2011, Entropy.

[22]  Jonathan W. Pillow,et al.  Bayesian and Quasi-Bayesian Estimators for Mutual Information from Discrete Data , 2013, Entropy.