Multivariate f-divergence Estimation With Confidence

The problem of f-divergence estimation is important in the fields of machine learning, information theory, and statistics. While several nonparametric divergence estimators exist, relatively few have known convergence properties. In particular, even for those estimators whose MSE convergence rates are known, the asymptotic distributions are unknown. We establish the asymptotic normality of a recently proposed ensemble estimator of f-divergence between two distributions from a finite number of samples. This estimator has MSE convergence rate of O (1/T), is simple to implement, and performs well in high dimensions. This theory enables us to perform divergence-based inference tasks such as testing equality of pairs of distributions based on empirical samples. We experimentally validate our theoretical results and, as an illustration, use them to empirically bound the best achievable classification error.

[1]  Kirthevasan Kandasamy,et al.  Nonparametric Estimation of Renyi Divergence and Friends , 2014, ICML.

[2]  Qing Wang,et al.  Divergence estimation of continuous distributions based on data-dependent partitions , 2005, IEEE Transactions on Information Theory.

[3]  S. Kumar,et al.  Neighborhood Graphs for Estimation of Density Functionals. , 2012 .

[4]  Alfred O. Hero,et al.  On Local Intrinsic Dimension Estimation and Its Applications , 2010, IEEE Transactions on Signal Processing.

[5]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Alfred O. Hero,et al.  Applications of entropic spanning graphs , 2002, IEEE Signal Process. Mag..

[7]  Igor Vajda,et al.  Estimation of the Information by an Adaptive Partitioning of the Observation Space , 1999, IEEE Trans. Inf. Theory.

[8]  Barnabás Póczos,et al.  On the Estimation of alpha-Divergences , 2011, AISTATS.

[9]  Trung-Kien Le,et al.  Information dependency: Strong consistency of Darbellay–Vajda partition estimators , 2013 .

[10]  H. Chernoff,et al.  Central Limit Theorems for Interchangeable Processes , 1958, Canadian Journal of Mathematics.

[11]  Martin J. Wainwright,et al.  Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization , 2008, IEEE Transactions on Information Theory.

[12]  Robert J. Butera,et al.  Real-time adaptive information-theoretic optimization of neurophysiology experiments , 2006, NIPS.

[13]  Inderjit S. Dhillon,et al.  A Divisive Information-Theoretic Feature Clustering Algorithm for Text Classification , 2003, J. Mach. Learn. Res..

[14]  László Györfi,et al.  Asymptotic Normality of L 1-Error in Density Estimation , 1995 .

[15]  Barnabás Póczos,et al.  Distribution to Distribution Regression , 2013, ICML.

[16]  Alfred O. Hero,et al.  Ensemble Estimators for Multivariate Entropy Estimation , 2013, IEEE Transactions on Information Theory.

[17]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[18]  C. Quesenberry,et al.  A nonparametric estimate of a multivariate density function , 1965 .

[19]  Alfred O. Hero,et al.  Ensemble estimation of multivariate f-divergence , 2014, 2014 IEEE International Symposium on Information Theory.

[20]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[21]  Michael J. Berry,et al.  An Information Theoretic Approach to the Functional Classification of Neurons , 2002, NIPS.

[22]  Fei-Fei Li,et al.  Exploring Functional Connectivities of the Human Brain using Multivariate Information Analysis , 2009, NIPS.

[23]  Shrikanth Narayanan,et al.  Information divergence estimation based on data-dependent partitions , 2010 .

[24]  G. Crooks On Measures of Entropy and Information , 2015 .

[25]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[26]  Richard A. Davis,et al.  On Some Global Measures of the Deviations of Density Function Estimates , 2011 .

[27]  Barnabás Póczos,et al.  Generalized Exponential Concentration Inequality for Renyi Divergence Estimation , 2014, ICML.

[28]  M. Rosenblatt,et al.  Multivariate k-nearest neighbor density estimates , 1979 .

[29]  Alfred O. Hero,et al.  Estimation of Nonlinear Functionals of Densities With Confidence , 2012, IEEE Transactions on Information Theory.

[30]  Alfred O. Hero,et al.  Ensemble weighted kernel estimators for multivariate entropy estimation , 2012, NIPS.

[31]  Qing Wang,et al.  Divergence Estimation for Multidimensional Densities Via $k$-Nearest-Neighbor Distances , 2009, IEEE Transactions on Information Theory.