Breaking the Bandwidth Barrier: Geometrical Adaptive Entropy Estimation

Estimators of information theoretic measures, such as entropy and mutual information, are a basic workhorse for many downstream applications in modern data science. State-of-the-art approaches have been either geometric [nearest neighbor (NN)-based] or kernel-based (with a globally chosen bandwidth). In this paper, we combine both these approaches to design new estimators of entropy and mutual information that outperform the state-of-the-art methods. Our estimator uses local bandwidth choices of <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>-NN distances with a finite <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>, independent of the sample size. Such a local and data dependent choice ameliorates boundary bias and improves performance in practice, but the bandwidth is vanishing at a fast rate, leading to a non-vanishing bias. We show that the asymptotic bias of the proposed estimator is <italic>universal</italic>; it is independent of the underlying distribution. Hence, it can be precomputed and subtracted from the estimate. As a byproduct, we obtain a unified way of obtaining <italic>both</italic> the kernel and NN estimators. The corresponding theoretical contribution relating the asymptotic geometry of nearest neighbors to order statistics is of independent mathematical interest.

[1]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[2]  P. Bickel,et al.  Sums of Functions of Nearest Neighbor Distances, Moment Bounds, Limit Theorems and a Goodness of Fit Test , 1983 .

[3]  P. Hall Limit theorems for sums of general functions of m-spacings , 1984, Mathematical Proceedings of the Cambridge Philosophical Society.

[4]  P. Strevens Iii , 1985 .

[5]  P. Hall On powerful distributional tests based on sample spacings , 1986 .

[6]  H. Joe Estimation of entropy and other functionals of a multivariate density , 1989 .

[7]  R. Reiss Approximate Distributions of Order Statistics: With Applications to Nonparametric Statistics , 1989 .

[8]  A. Tsybakov,et al.  Root-N consistent estimators of entropy for densities with unbounded support , 1994, Proceedings of 1994 Workshop on Information Theory and Statistics.

[9]  M. C. Jones,et al.  Locally parametric nonparametric density estimation , 1996 .

[10]  C. Loader Local Likelihood Density Estimation , 1996 .

[11]  Guohua Pan,et al.  Local Regression and Likelihood , 1999, Technometrics.

[12]  M. Wand Local Regression and Likelihood , 2001 .

[13]  Harshinder Singh,et al.  Nearest Neighbor Estimates of Entropy , 2003 .

[14]  S. Sheather Density Estimation , 2004 .

[15]  Yoshua Bengio,et al.  Locally Weighted Full Covariance Gaussian Density Estimation , 2004 .

[16]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[17]  M. N. Goria,et al.  A new class of random vector entropy estimators and its applications in testing statistical hypotheses , 2005 .

[18]  Stergios B. Fotopoulos,et al.  All of Nonparametric Statistics , 2007, Technometrics.

[19]  Liam Paninski,et al.  Undersmoothed Kernel Entropy Estimators , 2008, IEEE Transactions on Information Theory.

[20]  Neeraj Misra,et al.  Kn-nearest neighbor estimators of entropy , 2008 .

[21]  L. Pronzato,et al.  A class of Rényi information estimators for multidimensional densities , 2008, 0810.5302.

[22]  Yu. Golubev,et al.  On entropy estimation by m-spacing method , 2009 .

[23]  Alexander G. Gray,et al.  Submanifold density estimation , 2009, NIPS.

[24]  Qing Wang,et al.  Divergence Estimation for Multidimensional Densities Via $k$-Nearest-Neighbor Distances , 2009, IEEE Transactions on Information Theory.

[25]  Sanjeev R. Kulkarni,et al.  Universal Estimation of Information Measures for Analog Sources , 2009, Found. Trends Commun. Inf. Theory.

[26]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[27]  Barnabás Póczos,et al.  Estimation of Renyi Entropy and Mutual Information Based on Generalized Nearest-Neighbor Graphs , 2010, NIPS.

[28]  Michael Mitzenmacher,et al.  Detecting Novel Associations in Large Data Sets , 2011, Science.

[29]  W. Marsden I and J , 2012 .

[30]  Larry A. Wasserman,et al.  Exponential Concentration for Mutual Information Estimation with Application to Forests , 2012, NIPS.

[31]  Alfred O. Hero,et al.  Ensemble Estimators for Multivariate Entropy Estimation , 2013, IEEE Transactions on Information Theory.

[32]  Sean C. Bendall,et al.  Conditional density-based analysis of T cell signaling in single-cell data , 2014, Science.

[33]  Aram Galstyan,et al.  Discovering Structure in High-Dimensional Data Through Correlation Explanation , 2014, NIPS.

[34]  Alfred O. Hero,et al.  Ensemble estimation of multivariate f-divergence , 2014, 2014 IEEE International Symposium on Information Theory.

[35]  Luc Devroye,et al.  Lectures on the Nearest Neighbor Method , 2015 .

[36]  Aram Galstyan,et al.  Efficient Estimation of Mutual Information for Strongly Dependent Variables , 2014, AISTATS.

[37]  James M. Robins,et al.  Nonparametric von Mises Estimators for Entropies, Divergences and Mutual Informations , 2015, NIPS.

[38]  Aram Galstyan,et al.  Estimating Mutual Information by Local Gaussian Approximation , 2015, UAI.

[39]  Aram Galstyan,et al.  The Information Sieve , 2015, ICML.

[40]  S. Delattre,et al.  On the Kozachenko-Leonenko entropy estimator , 2016, 1602.07440.

[41]  Barnabás Póczos,et al.  Analysis of k-Nearest Neighbor Distances with Application to Entropy Estimation , 2016, ArXiv.

[42]  Kevin R. Moon,et al.  Nonparametric Ensemble Estimation of Distributional Functionals , 2016 .

[43]  Pramod Viswanath,et al.  Demystifying fixed k-nearest neighbor information estimators , 2016, 2017 IEEE International Symposium on Information Theory (ISIT).

[44]  Thomas B. Berrett,et al.  Efficient multivariate entropy estimation via $k$-nearest neighbour distances , 2016, The Annals of Statistics.