Analysis of KNN Information Estimators for Smooth Distributions

KSG mutual information estimator, which is based on the distances of each sample to its k-th nearest neighbor, is widely used to estimate mutual information between two continuous random variables. Existing work has analyzed the convergence speed of this estimator for random variables with bounded support. In practice, however, KSG estimator also performs well for a much broader class of distributions, including not only those with bounded support but also those with long tail distributions. In this paper, we analyze the convergence speed of the error of KSG estimator for smooth distributions, whose support can be both bounded and unbounded. As KSG mutual information estimator can be viewed as an adaptive combination of KL entropy estimators, in our analysis, we also provide convergence analysis of KL entropy estimator for a broad class of distributions.

[1]  Thomas B. Berrett,et al.  Efficient multivariate entropy estimation via $k$-nearest neighbour distances , 2016, The Annals of Statistics.

[2]  Liam Paninski,et al.  Estimation of Entropy and Mutual Information , 2003, Neural Computation.

[3]  B. C. Carlson The Logarithmic Mean , 1972 .

[4]  Igor Vajda,et al.  Estimation of the Information by an Adaptive Partitioning of the Observation Space , 1999, IEEE Trans. Inf. Theory.

[5]  Puning Zhao,et al.  Analysis of KNN Information Estimators for Smooth Distributions , 2020, IEEE Transactions on Information Theory.

[6]  Pramod Viswanath,et al.  Demystifying fixed k-nearest neighbor information estimators , 2016, 2017 IEEE International Symposium on Information Theory (ISIT).

[7]  P. Hall,et al.  On the estimation of entropy , 1993 .

[8]  Yihong Wu,et al.  Minimax Rates of Entropy Estimation on Large Alphabets via Best Polynomial Approximation , 2014, IEEE Transactions on Information Theory.

[9]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Chung Chan,et al.  Info-Clustering: A Mathematical Theory for Data Clustering , 2016, IEEE Transactions on Molecular, Biological and Multi-Scale Communications.

[11]  Aram Galstyan,et al.  Estimating Mutual Information by Local Gaussian Approximation , 2015, UAI.

[12]  Gavin Brown,et al.  Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..

[13]  Moon,et al.  Estimation of mutual information using kernel density estimators. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[14]  Pramod Viswanath,et al.  Breaking the Bandwidth Barrier: Geometrical Adaptive Entropy Estimation , 2016, IEEE Transactions on Information Theory.

[15]  Alfred O. Hero,et al.  Scalable Hash-Based Estimation of Divergence Measures , 2018, 2018 Information Theory and Applications Workshop (ITA).

[16]  Yanjun Han,et al.  Optimal rates of entropy estimation over Lipschitz balls , 2017, The Annals of Statistics.

[17]  Aram Galstyan,et al.  Efficient Estimation of Mutual Information for Strongly Dependent Variables , 2014, AISTATS.

[18]  Barnabás Póczos,et al.  Analysis of k-Nearest Neighbor Distances with Application to Entropy Estimation , 2016, ArXiv.

[19]  S. Saigal,et al.  Relative performance of mutual information estimation methods for quantifying the dependence among short and noisy data. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20]  James M. Robins,et al.  Nonparametric von Mises Estimators for Entropies, Divergences and Mutual Informations , 2015, NIPS.

[21]  Yanjun Han,et al.  The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal , 2017, NeurIPS.

[22]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[23]  Gregory Valiant,et al.  Estimating the unseen: an n/log(n)-sample estimator for entropy and support size, shown optimal via new CLTs , 2011, STOC '11.

[24]  S. Delattre,et al.  On the Kozachenko-Leonenko entropy estimator , 2016, 1602.07440.

[25]  Luc Devroye,et al.  Lectures on the Nearest Neighbor Method , 2015 .

[26]  Oldrich A Vasicek,et al.  A Test for Normality Based on Sample Entropy , 1976 .

[27]  Kirthevasan Kandasamy,et al.  Nonparametric Estimation of Renyi Divergence and Friends , 2014, ICML.

[28]  A. Tsybakov,et al.  Root-N consistent estimators of entropy for densities with unbounded support , 1994, Proceedings of 1994 Workshop on Information Theory and Statistics.

[29]  Barnabás Póczos,et al.  Finite-Sample Analysis of Fixed-k Nearest Neighbor Density Functional Estimators , 2016, NIPS.

[30]  Michel Verleysen,et al.  A Comparison of Multivariate Mutual Information Estimators for Feature Selection , 2012, ICPRAM.

[31]  Dong Xiang,et al.  Information-theoretic measures for anomaly detection , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[32]  Yanjun Han,et al.  Minimax Estimation of Functionals of Discrete Distributions , 2014, IEEE Transactions on Information Theory.

[33]  Sebastian Nowozin,et al.  Information Theoretic Clustering Using Minimum Spanning Trees , 2012, DAGM/OAGM Symposium.

[34]  Alfred O. Hero,et al.  Information theoretic structure learning with confidence , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[35]  Maxim Raginsky,et al.  Information-theoretic analysis of generalization capability of learning algorithms , 2017, NIPS.

[36]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[37]  Embeddings of Sobolev spaces on unbounded domains , 1977 .