Analysis of KNN Information Estimators for Smooth Distributions

KSG mutual information estimator, which is based on the distances of each sample to its $k$ -th nearest neighbor, is widely used to estimate mutual information between two continuous random variables. Existing work has analyzed the convergence rate of this estimator for random variables whose densities are bounded away from zero in its support. In practice, however, KSG estimator also performs well for a much broader class of distributions, including not only those with bounded support and densities bounded away from zero, but also those with bounded support but densities approaching zero, and those with unbounded support. In this paper, we analyze the convergence rate of the error of KSG estimator for smooth distributions, whose support of density can be both bounded and unbounded. As KSG mutual information estimator can be viewed as an adaptive recombination of KL entropy estimators, in our analysis, we also provide convergence analysis of KL entropy estimator for a broad class of distributions.

[1]  Sebastian Nowozin,et al.  Information Theoretic Clustering Using Minimum Spanning Trees , 2012, DAGM/OAGM Symposium.

[2]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[3]  R. Bhatia The logarithmic mean , 2008 .

[4]  Gregory Valiant,et al.  Estimating the unseen: an n/log(n)-sample estimator for entropy and support size, shown optimal via new CLTs , 2011, STOC '11.

[5]  P. Hall,et al.  On the estimation of entropy , 1993 .

[6]  Barnabás Póczos,et al.  Finite-Sample Analysis of Fixed-k Nearest Neighbor Density Functional Estimators , 2016, NIPS.

[7]  Michel Verleysen,et al.  A Comparison of Multivariate Mutual Information Estimators for Feature Selection , 2012, ICPRAM.

[8]  Dong Xiang,et al.  Information-theoretic measures for anomaly detection , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[9]  Thomas B. Berrett,et al.  Efficient multivariate entropy estimation via $k$-nearest neighbour distances , 2016, The Annals of Statistics.

[10]  Gavin Brown,et al.  Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..

[11]  Igor Vajda,et al.  Estimation of the Information by an Adaptive Partitioning of the Observation Space , 1999, IEEE Trans. Inf. Theory.

[12]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[13]  Embeddings of Sobolev spaces on unbounded domains , 1977 .

[14]  Yanjun Han,et al.  Minimax Estimation of Functionals of Discrete Distributions , 2014, IEEE Transactions on Information Theory.

[15]  Yanjun Han,et al.  Optimal rates of entropy estimation over Lipschitz balls , 2017, The Annals of Statistics.

[16]  S. Delattre,et al.  On the Kozachenko-Leonenko entropy estimator , 2016, 1602.07440.

[17]  Moon,et al.  Estimation of mutual information using kernel density estimators. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[18]  Luc Devroye,et al.  Lectures on the Nearest Neighbor Method , 2015 .

[19]  Aram Galstyan,et al.  Efficient Estimation of Mutual Information for Strongly Dependent Variables , 2014, AISTATS.

[20]  Barnabás Póczos,et al.  Analysis of k-Nearest Neighbor Distances with Application to Entropy Estimation , 2016, ArXiv.

[21]  Yihong Wu,et al.  Minimax Rates of Entropy Estimation on Large Alphabets via Best Polynomial Approximation , 2014, IEEE Transactions on Information Theory.

[22]  Chung Chan,et al.  Info-Clustering: A Mathematical Theory for Data Clustering , 2016, IEEE Transactions on Molecular, Biological and Multi-Scale Communications.

[23]  Liam Paninski,et al.  Estimation of Entropy and Mutual Information , 2003, Neural Computation.

[24]  Oldrich A Vasicek,et al.  A Test for Normality Based on Sample Entropy , 1976 .

[25]  Kirthevasan Kandasamy,et al.  Nonparametric Estimation of Renyi Divergence and Friends , 2014, ICML.

[26]  Yanjun Han,et al.  The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal , 2017, NeurIPS.

[27]  Alfred O. Hero,et al.  Scalable Hash-Based Estimation of Divergence Measures , 2018, 2018 Information Theory and Applications Workshop (ITA).

[28]  Lifeng Lai,et al.  Analysis of KNN Information Estimators for Smooth Distributions , 2020, IEEE Transactions on Information Theory.

[29]  A. Tsybakov,et al.  Root-N consistent estimators of entropy for densities with unbounded support , 1994, Proceedings of 1994 Workshop on Information Theory and Statistics.

[30]  James M. Robins,et al.  Nonparametric von Mises Estimators for Entropies, Divergences and Mutual Informations , 2015, NIPS.

[31]  S. Saigal,et al.  Relative performance of mutual information estimation methods for quantifying the dependence among short and noisy data. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[32]  Pramod Viswanath,et al.  Breaking the Bandwidth Barrier: Geometrical Adaptive Entropy Estimation , 2016, IEEE Transactions on Information Theory.

[33]  Pramod Viswanath,et al.  Demystifying fixed k-nearest neighbor information estimators , 2016, 2017 IEEE International Symposium on Information Theory (ISIT).

[34]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Aram Galstyan,et al.  Estimating Mutual Information by Local Gaussian Approximation , 2015, UAI.