Nonparametric estimation of the likelihood ratio and divergence functionals

We develop and analyze a nonparametric method for estimating the class of f-divergence functionals, and the density ratio of two probability distributions. Our method is based on a non-asymptotic variational characterization of the f-divergence, which allows us to cast the problem of estimating divergences in terms of risk minimization. We thus obtain an M-estimator for divergences, based on a convex and differentiable optimization problem that can be solved efficiently. We analyze the consistency and convergence rates for this M-estimator given conditions only on the ratio of densities.

[1]  G. C. Hood Estimation of Entropy , 1953 .

[2]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[3]  M. Birman,et al.  PIECEWISE-POLYNOMIAL APPROXIMATIONS OF FUNCTIONS OF THE CLASSES $ W_{p}^{\alpha}$ , 1967 .

[4]  T. Kailath The Divergence and Bhattacharyya Distance Measures in Signal Selection , 1967 .

[5]  R. Tyrrell Rockafellar,et al.  Convex Analysis , 1970, Princeton Landmarks in Mathematics and Physics.

[6]  L. Györfi,et al.  Density-free convergence properties of various estimators of entropy , 1987 .

[7]  H. Joe Estimation of entropy and other functionals of a multivariate density , 1989 .

[8]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[9]  P. Hall,et al.  On the estimation of entropy , 1993 .

[10]  P. Massart,et al.  Estimation of Integral Functionals of a Density , 1995 .

[11]  B. Laurent Efficient estimation of integral functionals of a density , 1996 .

[12]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[13]  P. Gänssler Weak Convergence and Empirical Processes - A. W. van der Vaart; J. A. Wellner. , 1997 .

[14]  A. V. D. Vaart,et al.  Asymptotic Statistics: U -Statistics , 1998 .

[15]  A. V. D. Vaart,et al.  Asymptotic Statistics: Frontmatter , 1998 .

[16]  A. V. D. Vaart Asymptotic Statistics: Delta Method , 1998 .

[17]  Flemming Topsøe,et al.  Some inequalities for information divergence and related measures of discrimination , 2000, IEEE Trans. Inf. Theory.

[18]  S. Geer Empirical Processes in M-Estimation , 2000 .

[19]  Martin J. Wainwright,et al.  On divergences, surrogate loss functions, and decentralized detection , 2005, ArXiv.

[20]  Qing Wang,et al.  Divergence estimation of continuous distributions based on data-dependent partitions , 2005, IEEE Transactions on Information Theory.

[21]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .