Direct Ensemble Estimation of Density Functionals

Estimating density functionals of analog sources is an important problem in statistical signal processing and information theory. Traditionally, estimating these quantities requires either making parametric assumptions about the underlying distributions or using non-parametric density estimation followed by integration. In this paper we introduce a direct nonparametric approach which bypasses the need for density estimation by using the error rates of k-NN classifiers as “data-driven” basis functions that can be combined to estimate a range of density functionals. However, this method is subject to a non-trivial bias that dramatically slows the rate of convergence in higher dimensions. To overcome this limitation, we develop an ensemble method for estimating the value of the basis function which, under some minor constraints on the smoothness of the underlying distributions, achieves the parametric rate of convergence regardless of data dimension.

[1]  Alfred O. Hero,et al.  Empirically Estimable Classification Bounds Based on a Nonparametric Divergence Measure , 2014, IEEE Transactions on Signal Processing.

[2]  Gregory Valiant,et al.  Estimating the unseen: an n/log(n)-sample estimator for entropy and support size, shown optimal via new CLTs , 2011, STOC '11.

[3]  Gregory Valiant,et al.  A CLT and tight lower bounds for estimating entropy , 2010, Electron. Colloquium Comput. Complex..

[4]  Anil K. Jain,et al.  Bootstrap Techniques for Error Estimation , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  B. Efron Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation , 1983 .

[6]  Liang Xiong,et al.  Kernels on Sample Sets via Nonparametric Divergence Estimates , 2012, 1202.0302.

[7]  M. H. Quenouille NOTES ON BIAS IN ESTIMATION , 1956 .

[8]  Yanjun Han,et al.  Minimax Estimation of Functionals of Discrete Distributions , 2014, IEEE Transactions on Information Theory.

[9]  Alfred O. Hero,et al.  Robust entropy estimation strategies based on edge weighted random graphs , 1998, Optics & Photonics.

[10]  Alfred O. Hero,et al.  Improving convergence of divergence functional ensemble estimators , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[11]  Sanjeev R. Kulkarni,et al.  Universal Estimation of Information Measures for Analog Sources , 2009, Found. Trends Commun. Inf. Theory.

[12]  Zoltán Szabó,et al.  Information theoretical estimators toolbox , 2014, J. Mach. Learn. Res..

[13]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[14]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .

[15]  H. Krim,et al.  Image Registration and Segmentation by Maximizing the Jensen-Rényi Divergence , 2003 .

[16]  木股 雅章,et al.  SPIE's International Symposium on Optical Science, Engineering, and Instrumentation報告 , 1998 .

[17]  Yihong Wu,et al.  Minimax Rates of Entropy Estimation on Large Alphabets via Best Polynomial Approximation , 2014, IEEE Transactions on Information Theory.

[18]  Alfred O. Hero,et al.  A data-driven basis for direct estimation of functionals of distributions , 2017, ArXiv.

[19]  Alfred O. Hero,et al.  Ensemble Estimators for Multivariate Entropy Estimation , 2013, IEEE Transactions on Information Theory.

[20]  H. L. Gray,et al.  On Bias Reduction in Estimation , 1971 .

[21]  Deniz Erdoğmuş,et al.  Blind source separation using Renyi's mutual information , 2001, IEEE Signal Processing Letters.

[22]  Nuno Vasconcelos,et al.  A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications , 2003, NIPS.

[23]  Demetri Psaltis,et al.  On the finite sample performance of the nearest neighbor classifier , 1993, IEEE Trans. Inf. Theory.

[24]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[25]  Alfred O. Hero,et al.  Multivariate f-divergence Estimation With Confidence , 2014, NIPS.