Nonparametric Estimation of Renyi Divergence and Friends

We consider nonparametric estimation of L2, Renyi-α and Tsallis-α divergences between continuous distributions. Our approach is to construct estimators for particular integral functionals of two densities and translate them into divergence estimators. For the integral functionals, our estimators are based on corrections of a preliminary plug-in estimator. We show that these estimators achieve the parametric convergence rate of n-1/2 when the densities' smoothness, s, are both at least d/4 where d is the dimension. We also derive minimax lower bounds for this problem which confirm that s > d/4 is necessary to achieve the n-1/2 rate of convergence. We validate our theoretical guarantees with a number of simulations.

[1]  Nikolai Leonenko,et al.  Statistical inference for the epsilon-entropy and the quadratic Rényi entropy , 2010, J. Multivar. Anal..

[2]  Martin J. Wainwright,et al.  Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization , 2008, IEEE Transactions on Information Theory.

[3]  Larry A. Wasserman,et al.  Exponential Concentration for Mutual Information Estimation with Application to Forests , 2012, NIPS.

[4]  C. Tsallis Possible generalization of Boltzmann-Gibbs statistics , 1988 .

[5]  A. Hero,et al.  Empirical estimation of entropy functionals with confidence , 2010, 1012.4188.

[6]  Richard Nickl,et al.  A simple adaptive estimator of the integrated square of a density , 2008, 0803.0847.

[7]  P. Massart,et al.  Estimation of Integral Functionals of a Density , 1995 .

[8]  Alfred O. Hero,et al.  Convergence rates of minimal graphs with random vertices , 2002 .

[9]  Barnabás Póczos,et al.  Generalized Exponential Concentration Inequality for Renyi Divergence Estimation , 2014, ICML.

[10]  L. Pardo Statistical Inference Based on Divergence Measures , 2005 .

[11]  G. Kerkyacharian,et al.  Estimating nonquadratic functionals of a density using Haar wavelets , 1996 .

[12]  Qing Wang,et al.  Divergence Estimation for Multidimensional Densities Via $k$-Nearest-Neighbor Distances , 2009, IEEE Transactions on Information Theory.

[13]  Qing Wang,et al.  Divergence estimation of continuous distributions based on data-dependent partitions , 2005, IEEE Transactions on Information Theory.

[14]  D. Donoho,et al.  Geometrizing Rates of Convergence , II , 2008 .

[15]  Barnabás Póczos,et al.  On the Estimation of alpha-Divergences , 2011, AISTATS.

[16]  Alfred O. Hero,et al.  Ensemble Estimators for Multivariate Entropy Estimation , 2013, IEEE Transactions on Information Theory.

[17]  Don H. Johnson,et al.  Information-theoretic analysis of neural coding , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[18]  David Källberg,et al.  Estimation of entropy-type integral functionals , 2012, 1209.2544.

[19]  Fernando Pérez-Cruz,et al.  Kullback-Leibler divergence estimation of continuous distributions , 2008, 2008 IEEE International Symposium on Information Theory.

[20]  B. Laurent Efficient estimation of integral functionals of a density , 1996 .

[21]  Barnabás Póczos,et al.  Estimation of Renyi Entropy and Mutual Information Based on Generalized Nearest-Neighbor Graphs , 2010, NIPS.

[22]  D. Donoho,et al.  Geometrizing Rates of Convergence, III , 1991 .

[23]  Barnabás Póczos,et al.  Nonparametric kernel estimators for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  A. Hero,et al.  Estimation of Renyi information divergence via pruned minimal spanning trees , 1999, Proceedings of the IEEE Signal Processing Workshop on Higher-Order Statistics. SPW-HOS '99.

[25]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[26]  L. Devroye,et al.  Nonparametric density estimation : the L[1] view , 1987 .

[27]  G. Crooks On Measures of Entropy and Information , 2015 .