Nonparametric von Mises Estimators for Entropies, Divergences and Mutual Informations

We propose and analyse estimators for statistical functionals of one or more distributions under nonparametric assumptions. Our estimators are derived from the von Mises expansion and are based on the theory of influence functions, which appear in the semiparametric statistics literature. We show that estimators based either on data-splitting or a leave-one-out technique enjoy fast rates of convergence and other favorable theoretical properties. We apply this framework to derive estimators for several popular information theoretic quantities, and via empirical evaluation, show the advantage of this approach over existing estimators.

[1]  Erik G. Miller A new class of entropy estimators for multi-dimensional densities , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[2]  Pedro M. Crespo,et al.  Entropy and Kullback-Leibler divergence estimation based on Szegö's theorem , 2009, 2009 17th European Signal Processing Conference.

[3]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[4]  Bernt Schiele,et al.  Analyzing appearance and contour based methods for object categorization , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[5]  Kirthevasan Kandasamy,et al.  On Estimating L22 Divergence , 2015, AISTATS.

[6]  James Robins,et al.  Quadratic semiparametric Von Mises calculus , 2009, Metrika.

[7]  Larry A. Wasserman,et al.  Exponential Concentration for Mutual Information Estimation with Application to Forests , 2012, NIPS.

[8]  David Källberg,et al.  Estimation of entropy-type integral functionals , 2012, 1209.2544.

[9]  M. N. Goria,et al.  A new class of random vector entropy estimators and its applications in testing statistical hypotheses , 2005 .

[10]  Barnabás Póczos,et al.  Exponential Concentration of a Density Functional Estimator , 2014, NIPS.

[11]  Barnabás Póczos,et al.  Estimation of Renyi Entropy and Mutual Information Based on Generalized Nearest-Neighbor Graphs , 2010, NIPS.

[12]  Bernt Schiele,et al.  Analyzing contour and appearance based methods for object categorization , 2003, CVPR 2003.

[13]  Martin J. Wainwright,et al.  Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization , 2008, IEEE Transactions on Information Theory.

[14]  Dan Stowell,et al.  Fast Multidimensional Entropy Estimation by $k$-d Partitioning , 2009, IEEE Signal Processing Letters.

[15]  Barnabás Póczos,et al.  On the Estimation of alpha-Divergences , 2011, AISTATS.

[16]  Nikolai Leonenko,et al.  Statistical inference for the epsilon-entropy and the quadratic Rényi entropy , 2010, J. Multivar. Anal..

[17]  Inderjit S. Dhillon,et al.  A Divisive Information-Theoretic Feature Clustering Algorithm for Text Classification , 2003, J. Mach. Learn. Res..

[18]  John W. Fisher,et al.  ICA Using Spacings Estimates of Entropy , 2003, J. Mach. Learn. Res..

[19]  Robert J. Butera,et al.  Real-time adaptive information-theoretic optimization of neurophysiology experiments , 2006, NIPS.

[20]  L. Fernholz von Mises Calculus For Statistical Functionals , 1983 .

[21]  Qing Wang,et al.  Divergence Estimation for Multidimensional Densities Via $k$-Nearest-Neighbor Distances , 2009, IEEE Transactions on Information Theory.

[22]  H. A. Noughabi,et al.  On the entropy estimators , 2013 .

[23]  Alfred O. Hero,et al.  Multivariate f-divergence Estimation With Confidence , 2014, NIPS.

[24]  L. Wasserman,et al.  On Estimating $L_2^2$ Divergence , 2014, AISTATS 2015.

[25]  Michael J. Berry,et al.  An Information Theoretic Approach to the Functional Classification of Neurons , 2002, NIPS.

[26]  B. Laurent Efficient estimation of integral functionals of a density , 1996 .

[27]  G. Kerkyacharian,et al.  Estimating nonquadratic functionals of a density using Haar wavelets , 1996 .

[28]  A. V. D. Vaart,et al.  On the Asymptotic Information Bound , 1989 .

[29]  Barnabás Póczos,et al.  Nonparametric Divergence Estimation with Applications to Machine Learning on Distributions , 2011, UAI.

[30]  P. Massart,et al.  Estimation of Integral Functionals of a Density , 1995 .

[31]  Zoltán Szabó,et al.  Information theoretical estimators toolbox , 2014, J. Mach. Learn. Res..

[32]  L. Györfi,et al.  Nonparametric entropy estimation. An overview , 1997 .

[33]  Kirthevasan Kandasamy,et al.  Nonparametric Estimation of Renyi Divergence and Friends , 2014, ICML.

[34]  Alfred O. Hero,et al.  On Local Intrinsic Dimension Estimation and Its Applications , 2010, IEEE Transactions on Signal Processing.

[35]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Alfred O. Hero,et al.  Applications of entropic spanning graphs , 2002, IEEE Signal Process. Mag..