Enhancing Nearest Neighbor Based Entropy Estimator for High Dimensional Distributions via Bootstrapping Local Ellipsoid

An ellipsoid-based, improved kNN entropy estimator based on random samples of distribution for high dimensionality is developed. We argue that the inaccuracy of the classical kNN estimator in high dimensional spaces results from the local uniformity assumption and the proposed method mitigates the local uniformity assumption by two crucial extensions, a local ellipsoid-based volume correction and a correction acceptance testing procedure. Relevant theoretical contributions are provided and several experiments from simple to complicated cases have shown that the proposed estimator can effectively reduce the bias especially in high dimensionalities, outperforming current state of the art alternative estimators.

[1]  Pramod Viswanath,et al.  Demystifying fixed k-nearest neighbor information estimators , 2016, 2017 IEEE International Symposium on Information Theory (ISIT).

[2]  Pramod Viswanath,et al.  Breaking the Bandwidth Barrier: Geometrical Adaptive Entropy Estimation , 2016, IEEE Transactions on Information Theory.

[3]  M. N. Goria,et al.  A new class of random vector entropy estimators and its applications in testing statistical hypotheses , 2005 .

[4]  Aram Galstyan,et al.  Efficient Estimation of Mutual Information for Strongly Dependent Variables , 2014, AISTATS.

[5]  Barnabás Póczos,et al.  Analysis of k-Nearest Neighbor Distances with Application to Entropy Estimation , 2016, ArXiv.

[6]  J. Orava,et al.  K-nearest neighbour kernel density estimation, the choice of optimal k , 2011 .

[7]  Jie Sun,et al.  Geometric k-nearest neighbor estimation of entropy and mutual information , 2017, Chaos.

[8]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  Fernando Pérez-Cruz,et al.  Kullback-Leibler divergence estimation of continuous distributions , 2008, 2008 IEEE International Symposium on Information Theory.

[10]  Qing Wang,et al.  Divergence Estimation for Multidimensional Densities Via $k$-Nearest-Neighbor Distances , 2009, IEEE Transactions on Information Theory.

[11]  Barnabás Póczos,et al.  Finite-Sample Analysis of Fixed-k Nearest Neighbor Density Functional Estimators , 2016, NIPS.

[12]  Didier Chauveau,et al.  The Nearest Neighbor entropy estimate: an adequate tool for adaptive MCMC evaluation , 2014 .

[13]  Alfred O. Hero,et al.  Ensemble Estimation of Information Divergence , 2016, Entropy.

[14]  Alfred O. Hero,et al.  Ensemble Estimators for Multivariate Entropy Estimation , 2013, IEEE Transactions on Information Theory.

[15]  Frank Chongwoo Park,et al.  Bias Reduction and Metric Learning for Nearest-Neighbor Estimation of Kullback-Leibler Divergence , 2018, Neural Computation.

[16]  Aram Galstyan,et al.  Estimating Mutual Information by Local Gaussian Approximation , 2015, UAI.

[17]  Barnabás Póczos,et al.  Exponential Concentration of a Density Functional Estimator , 2014, NIPS.

[18]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[19]  Alfred O. Hero,et al.  Ensemble estimation of mutual information , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[20]  H. Lebesgue Sur l'intégration des fonctions discontinues , 1910 .

[21]  Tim van de Cruys Two Multivariate Generalizations of Pointwise Mutual Information , 2011, Proceedings of the Workshop on Distributional Semantics and Compositionality.

[22]  Huaiyu Zhu On Information and Sufficiency , 1997 .