Parametric Bayesian Estimation of Differential Entropy and Relative Entropy

Given iid samples drawn from a distribution with known parametric form, we propose the minimization of expected Bregman divergence to form Bayesian estimates of differential entropy and relative entropy, and derive such estimators for the uniform, Gaussian, Wishart, and inverse Wishart distributions. Additionally, formulas are given for a log gamma Bregman divergence and the differential entropy and relative entropy for the Wishart and inverse Wishart. The results, as always with Bayesian estimates, depend on the accuracy of the prior parameters, but example simulations show that the performance can be substantially improved compared to maximum likelihood or state-of-the-art nonparametric estimators.

[1]  Abdulmotaleb El-Saddik,et al.  A Novel Biometric System for Identification and Verification of Haptic Users , 2007, IEEE Transactions on Instrumentation and Measurement.

[2]  Martin J. Wainwright,et al.  Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization , 2008, IEEE Transactions on Information Theory.

[3]  Jean-François Bercher,et al.  Estimating the entropy of a signal with applications , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[4]  Sanjeev R. Kulkarni,et al.  A Nearest-Neighbor Approach to Estimating Divergence between Continuous Random Vectors , 2006, 2006 IEEE International Symposium on Information Theory.

[5]  Martin J. Wainwright,et al.  Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization , 2007, NIPS.

[6]  W. Bastiaan Kleijn,et al.  On the Estimation of Differential Entropy From Data Located on Embedded Manifolds , 2007, IEEE Transactions on Information Theory.

[7]  Maya R. Gupta,et al.  Bayesian Quadratic Discriminant Analysis , 2007, J. Mach. Learn. Res..

[8]  L. Györfi,et al.  Nonparametric entropy estimation. An overview , 1997 .

[9]  Alfred O. Hero,et al.  Asymptotic theory of greedy approximations to minimal k-point random graphs , 1999, IEEE Trans. Inf. Theory.

[10]  Fernando Pérez-Cruz,et al.  Estimation of Information Theoretic Measures for Continuous Random Variables , 2008, NIPS.

[11]  D. V. Gokhale,et al.  Entropy expressions and their estimators for multivariate distributions , 1989, IEEE Trans. Inf. Theory.

[12]  R. Moddemeijer On estimation of entropy and mutual information of continuous distributions , 1989 .

[13]  M. Bilodeau,et al.  Theory of multivariate statistics , 1999 .

[14]  Marc M. Van Hulle,et al.  Edgeworth Approximation of Multivariate Differential Entropy , 2005, Neural Computation.

[15]  R. Kass The Geometry of Asymptotic Inference , 1989 .

[16]  Qing Wang,et al.  Divergence Estimation for Multidimensional Densities Via $k$-Nearest-Neighbor Distances , 2009, IEEE Transactions on Information Theory.

[17]  Maya R. Gupta,et al.  Functional Bregman Divergence and Bayesian Estimation of Distributions , 2006, IEEE Transactions on Information Theory.

[18]  Xin Guo,et al.  On the optimality of conditional expectation as a Bregman predictor , 2005, IEEE Trans. Inf. Theory.

[19]  Han-Lim Choi,et al.  Adaptive sampling and forecasting with mobile sensor networks , 2009 .

[20]  Charles L. Byrne,et al.  General entropy criteria for inverse problems, with applications to data compression, pattern classification, and cluster analysis , 1990, IEEE Trans. Inf. Theory.

[21]  M. N. Goria,et al.  A new class of random vector entropy estimators and its applications in testing statistical hypotheses , 2005 .

[22]  Qing Wang,et al.  Divergence estimation of continuous distributions based on data-dependent partitions , 2005, IEEE Transactions on Information Theory.

[23]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[24]  Alfred O. Hero,et al.  Geodesic entropic graphs for dimension and entropy estimation in manifold learning , 2004, IEEE Transactions on Signal Processing.

[25]  Harshinder Singh,et al.  Estimation of the entropy of a multivariate normal distribution , 2005 .

[26]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[27]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[28]  Alfred O. Hero,et al.  Applications of entropic spanning graphs , 2002, IEEE Signal Process. Mag..

[29]  Neeraj Misra,et al.  Kn-nearest neighbor estimators of entropy , 2008 .