Ensemble Estimation of Distributional Functionals via $k$-Nearest Neighbors

The problem of accurate nonparametric estimation of distributional functionals (integral functionals of one or more probability distributions) has received recent interest due to their wide applicability in signal processing, information theory, machine learning, and statistics. In particular, $k$-nearest neighbor (nn) based methods have received a lot of attention due to their adaptive nature and their relatively low computational complexity. We derive the mean squared error (MSE) convergence rates of leave-one-out $k$-nn plug-in density estimators of a large class of distributional functionals without boundary correction. We then apply the theory of optimally weighted ensemble estimation to obtain weighted ensemble estimators that achieve the parametric MSE rate under assumptions that are competitive with the state of the art. The asymptotic distributions of these estimators, which are unknown for all other $k$-nn based distributional functional estimators, are also presented which enables us to perform hypothesis testing.

[1]  Valery Korzhik,et al.  Steganographic applications of the nearest-neighbor approach to Kullback-Leibler divergence estimation , 2015, 2015 Third International Conference on Digital Information, Networking, and Wireless Communications (DINWC).

[2]  Alfred O. Hero,et al.  Improving convergence of divergence functional ensemble estimators , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[3]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[4]  Aram Galstyan,et al.  Efficient Estimation of Mutual Information for Strongly Dependent Variables , 2014, AISTATS.

[5]  Kirthevasan Kandasamy,et al.  Nonparametric Estimation of Renyi Divergence and Friends , 2014, ICML.

[6]  S. Li Concise Formulas for the Area and Volume of a Hyperspherical Cap , 2011 .

[7]  Visar Berisha,et al.  Direct Ensemble Estimation of Density Functionals , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  S. Kumar,et al.  Neighborhood Graphs for Estimation of Density Functionals. , 2012 .

[9]  Alfred O. Hero,et al.  On Local Intrinsic Dimension Estimation and Its Applications , 2010, IEEE Transactions on Signal Processing.

[10]  Lorenzo Bruzzone,et al.  An extension of the Jeffreys-Matusita distance to multiclass cases for feature selection , 1995, IEEE Trans. Geosci. Remote. Sens..

[11]  Martin J. Wainwright,et al.  Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization , 2008, IEEE Transactions on Information Theory.

[12]  Alfred O. Hero,et al.  Multivariate f-divergence Estimation With Confidence , 2014, NIPS.

[13]  Alfred O. Hero,et al.  The intrinsic value of HFO features as a biomarker of epileptic activity , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Pramod Viswanath,et al.  Density functional estimators with k-nearest neighbor bandwidths , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[15]  Barnabás Póczos,et al.  Finite-Sample Analysis of Fixed-k Nearest Neighbor Density Functional Estimators , 2016, NIPS.

[16]  Barnabás Póczos,et al.  On the Estimation of alpha-Divergences , 2011, AISTATS.

[17]  Barnabás Póczos,et al.  Two-stage sampled learning theory on distributions , 2015, AISTATS.

[18]  Alfred O. Hero,et al.  Image patch analysis of sunspots and active regions. II. Clustering via dictionary learning , 2015, ArXiv.

[19]  D. N. Kashid,et al.  Variable selection via penalized minimum φ-divergence estimation in logistic regression , 2014 .

[20]  Inderjit S. Dhillon,et al.  A Divisive Information-Theoretic Feature Clustering Algorithm for Text Classification , 2003, J. Mach. Learn. Res..

[21]  N. Henze,et al.  On the multivariate runs test , 1999 .

[22]  M. Rosenblatt,et al.  Multivariate k-nearest neighbor density estimates , 1979 .

[23]  Alfred O. Hero,et al.  Estimation of Nonlinear Functionals of Densities With Confidence , 2012, IEEE Transactions on Information Theory.

[24]  Barnabás Póczos,et al.  Exponential Concentration of a Density Functional Estimator , 2014, NIPS.

[25]  Alfred O. Hero,et al.  Information theoretic structure learning with confidence , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26]  Barnabás Póczos,et al.  Distribution to Distribution Regression , 2013, ICML.

[27]  Alfred O. Hero,et al.  Empirically Estimable Classification Bounds Based on a New Divergence Measure , 2014, ArXiv.

[28]  Gang Liu,et al.  SAR image segmentation via non-local active contours , 2014, 2014 IEEE Geoscience and Remote Sensing Symposium.

[29]  Alfred O. Hero,et al.  Ensemble Estimators for Multivariate Entropy Estimation , 2013, IEEE Transactions on Information Theory.

[30]  Robert J. Butera,et al.  Real-time adaptive information-theoretic optimization of neurophysiology experiments , 2006, NIPS.

[31]  Alfred O. Hero,et al.  Direct estimation of information divergence using nearest neighbor ratios , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[32]  Pramod K. Varshney,et al.  A Tight Upper Bound on the Bayesian Probability of Error , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[34]  Alfred O. Hero,et al.  Ensemble estimation of multivariate f-divergence , 2014, 2014 IEEE International Symposium on Information Theory.

[35]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[36]  Hadar I. Avi-Itzhak,et al.  Arbitrarily Tight Upper and Lower Bounds on the Bayesian Probability of Error , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  Qing Wang,et al.  Divergence Estimation for Multidimensional Densities Via $k$-Nearest-Neighbor Distances , 2009, IEEE Transactions on Information Theory.

[38]  James M. Robins,et al.  Nonparametric von Mises Estimators for Entropies, Divergences and Mutual Informations , 2015, NIPS.

[39]  Barnabás Póczos,et al.  Generalized Exponential Concentration Inequality for Renyi Divergence Estimation , 2014, ICML.

[40]  B. Efron,et al.  The Jackknife Estimate of Variance , 1981 .

[41]  Alfred O. Hero,et al.  Meta learning of bounds on the Bayes classifier error , 2015, 2015 IEEE Signal Processing and Signal Processing Education Workshop (SP/SPE).

[42]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[43]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[44]  Guorong Xuan,et al.  Bhattacharyya distance feature selection , 1996, ICPR.

[45]  C. Quesenberry,et al.  A nonparametric estimate of a multivariate density function , 1965 .