Efficient two-sample functional estimation and the super-oracle phenomenon

We consider the estimation of two-sample integral functionals, of the type that occur naturally, for example, when the object of interest is a divergence between unknown probability densities. Our first main result is that, in wide generality, a weighted nearest neighbour estimator is efficient, in the sense of achieving the local asymptotic minimax lower bound. Moreover, we also prove a corresponding central limit theorem, which facilitates the construction of asymptotically valid confidence intervals for the functional, having asymptotically minimal width. One interesting consequence of our results is the discovery that, for certain functionals, the worst-case performance of our estimator may improve on that of the natural `oracle' estimator, which is given access to the values of the unknown densities at the observations.

[1]  Max Wornowizki,et al.  Two-sample homogeneity tests based on divergence measures , 2016, Comput. Stat..

[2]  Gerald B. Folland,et al.  Real Analysis: Modern Techniques and Their Applications , 1984 .

[3]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[4]  Alfred O. Hero,et al.  Applications of entropic spanning graphs , 2002, IEEE Signal Process. Mag..

[5]  Barnabás Póczos,et al.  Minimax Estimation of Quadratic Fourier Functionals , 2018, ArXiv.

[6]  L. Györfi,et al.  Nonparametric entropy estimation. An overview , 1997 .

[7]  Kirthevasan Kandasamy,et al.  Nonparametric Estimation of Renyi Divergence and Friends , 2014, ICML.

[8]  F. Götze On the Rate of Convergence in the Multivariate CLT , 1991 .

[9]  Yanjun Han,et al.  Optimal rates of entropy estimation over Lipschitz balls , 2017, The Annals of Statistics.

[10]  Daniel D. Lee,et al.  Nearest neighbor density functional estimation based on inverse Laplace transform , 2018, ArXiv.

[11]  Thomas B. Berrett,et al.  Efficient multivariate entropy estimation via $k$-nearest neighbour distances , 2016, The Annals of Statistics.

[12]  Le Cam,et al.  On some asymptotic properties of maximum likelihood estimates and related Bayes' estimates , 1953 .

[13]  A. Goldenshluger,et al.  On adaptive minimax density estimation on $$R^d$$Rd , 2012, 1210.1715.

[14]  J. Dieudonne Foundations of Modern Analysis , 1969 .

[15]  H. Srivastava,et al.  Theory and Applications of Fractional Differential Equations , 2006 .

[16]  I. Johnstone,et al.  Density estimation by wavelet thresholding , 1996 .

[17]  P. Massart,et al.  Estimation of Integral Functionals of a Density , 1995 .

[18]  Pierre Baldi,et al.  On Normal Approximations of Distributions in Terms of Dependency Graphs , 1989 .

[19]  James M. Robins,et al.  Influence Functions for Machine Learning: Nonparametric Estimators for Entropies, Divergences and Mutual Informations , 2014, ArXiv.

[20]  Sebastian Nowozin,et al.  f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.

[21]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[22]  B. Laurent Efficient estimation of integral functionals of a density , 1996 .

[23]  A. Tsybakov,et al.  Root-N consistent estimators of entropy for densities with unbounded support , 1994, Proceedings of 1994 Workshop on Information Theory and Statistics.

[24]  Alfred O. Hero,et al.  Ensemble Estimation of Information Divergence , 2016, Entropy.

[25]  Luc Devroye,et al.  Lectures on the Nearest Neighbor Method , 2015 .

[26]  Barnabás Póczos,et al.  Finite-Sample Analysis of Fixed-k Nearest Neighbor Density Functional Estimators , 2016, NIPS.