Minimax Optimal Estimation of KL Divergence for Continuous Distributions

Estimating Kullback-Leibler divergence from identical and independently distributed samples is an important problem in various domains. One simple and effective estimator is based on the <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula> nearest neighbor distances between these samples. In this paper, we analyze the convergence rates of the bias and variance of this estimator. Furthermore, we derive a lower bound of the minimax mean square error and show that kNN method is asymptotically rate optimal.

[1]  N. H. Anderson,et al.  Two-sample test statistics for measuring discrepancies between two multivariate probability density functions using kernel-based density estimates , 1994 .

[2]  Barnabás Póczos,et al.  Finite-Sample Analysis of Fixed-k Nearest Neighbor Density Functional Estimators , 2016, NIPS.

[3]  Pramod Viswanath,et al.  Demystifying fixed k-nearest neighbor information estimators , 2016, 2017 IEEE International Symposium on Information Theory (ISIT).

[4]  A. Antos,et al.  Convergence properties of functional estimates for discrete distributions , 2001 .

[5]  Nuno Vasconcelos,et al.  A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications , 2003, NIPS.

[6]  James M. Robins,et al.  Nonparametric von Mises Estimators for Entropies, Divergences and Mutual Informations , 2015, NIPS.

[7]  Kirthevasan Kandasamy,et al.  Nonparametric Estimation of Renyi Divergence and Friends , 2014, ICML.

[8]  Qing Wang,et al.  Divergence estimation of continuous distributions based on data-dependent partitions , 2005, IEEE Transactions on Information Theory.

[9]  Alfred O. Hero,et al.  Ensemble estimation of multivariate f-divergence , 2014, 2014 IEEE International Symposium on Information Theory.

[10]  Martin J. Wainwright,et al.  Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization , 2008, IEEE Transactions on Information Theory.

[11]  James M. Robins,et al.  Influence Functions for Machine Learning: Nonparametric Estimators for Entropies, Divergences and Mutual Informations , 2014, ArXiv.

[12]  Carlos Riquelme,et al.  Practical and Consistent Estimation of f-Divergences , 2019, NeurIPS.

[13]  Javier Ramírez,et al.  A new Kullback-Leibler VAD for speech recognition in noise , 2004, IEEE Signal Processing Letters.

[14]  Larry A. Wasserman,et al.  Exponential Concentration for Mutual Information Estimation with Application to Forests , 2012, NIPS.

[15]  Luc Devroye,et al.  Lectures on the Nearest Neighbor Method , 2015 .

[16]  Thomas B. Berrett,et al.  Efficient multivariate entropy estimation via $k$-nearest neighbour distances , 2016, The Annals of Statistics.

[17]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[18]  Lifeng Lai,et al.  Analysis of K Nearest Neighbor KL Divergence Estimation for Continuous Distributions , 2020, 2020 IEEE International Symposium on Information Theory (ISIT).

[19]  S. Saigal,et al.  Relative performance of mutual information estimation methods for quantifying the dependence among short and noisy data. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20]  Inderjit S. Dhillon,et al.  A Divisive Information-Theoretic Feature Clustering Algorithm for Text Classification , 2003, J. Mach. Learn. Res..

[21]  Qing Wang,et al.  Divergence Estimation for Multidimensional Densities Via $k$-Nearest-Neighbor Distances , 2009, IEEE Transactions on Information Theory.

[22]  A. Timan Theory of Approximation of Functions of a Real Variable , 1994 .

[23]  Alfred O. Hero,et al.  Ensemble Estimation of Information Divergence , 2016, Entropy.

[24]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[25]  Michael Grabchak,et al.  Nonparametric Estimation of Küllback-Leibler Divergence , 2014, Neural Computation.

[26]  Yingbin Liang,et al.  Universal outlying sequence detection for continuous observations , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[27]  Barnabás Póczos,et al.  Analysis of k-Nearest Neighbor Distances with Application to Entropy Estimation , 2016, ArXiv.

[28]  Yingbin Liang,et al.  Estimation of KL Divergence: Optimal Minimax Rate , 2016, IEEE Transactions on Information Theory.

[29]  Lifeng Lai,et al.  Analysis of KNN Information Estimators for Smooth Distributions , 2020, IEEE Transactions on Information Theory.

[30]  A. Tsybakov,et al.  Root-N consistent estimators of entropy for densities with unbounded support , 1994, Proceedings of 1994 Workshop on Information Theory and Statistics.

[31]  Sanjeev R. Kulkarni,et al.  Universal Divergence Estimation for Finite-Alphabet Sources , 2006, IEEE Transactions on Information Theory.

[32]  Yihong Wu,et al.  Minimax Rates of Entropy Estimation on Large Alphabets via Best Polynomial Approximation , 2014, IEEE Transactions on Information Theory.

[33]  Yanjun Han,et al.  Minimax rate-optimal estimation of KL divergence between discrete distributions , 2016, 2016 International Symposium on Information Theory and Its Applications (ISITA).

[34]  S. Kulkarni,et al.  Universal estimation of entropy and divergence via block sorting , 2002, Proceedings IEEE International Symposium on Information Theory,.

[35]  Thomas B. Berrett,et al.  Efficient two-sample functional estimation and the super-oracle phenomenon , 2019 .