Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization

We develop and analyze an algorithm for nonparametric estimation of divergence functionals and the density ratio of two probability distributions. Our method is based on a variational characterization of f-divergences, which turns the estimation into a penalized convex risk minimization problem. We present a derivation of our kernel-based estimation algorithm and an analysis of convergence rates for the estimator. Our simulation results demonstrate the convergence behavior of the method, which compares favorably with existing methods in the literature.

[1]  G. C. Hood Estimation of Entropy , 1953 .

[2]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[3]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[4]  L. Györfi,et al.  Density-free convergence properties of various estimators of entropy , 1987 .

[5]  Saburou Saitoh,et al.  Theory of Reproducing Kernels and Its Applications , 1988 .

[6]  H. Joe Estimation of entropy and other functionals of a multivariate density , 1989 .

[7]  P. Hall,et al.  On the estimation of entropy , 1993 .

[8]  P. Massart,et al.  Estimation of Integral Functionals of a Density , 1995 .

[9]  B. Laurent Efficient estimation of integral functionals of a density , 1996 .

[10]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[11]  P. Gänssler Weak Convergence and Empirical Processes - A. W. van der Vaart; J. A. Wellner. , 1997 .

[12]  Flemming Topsøe,et al.  Some inequalities for information divergence and related measures of discrimination , 2000, IEEE Trans. Inf. Theory.

[13]  S. Geer Empirical Processes in M-Estimation , 2000 .

[14]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[15]  Ding-Xuan Zhou,et al.  The covering number in learning theory , 2002, J. Complex..

[16]  A. Keziou Dual representation of Φ-divergences and applications , 2003 .

[17]  Martin J. Wainwright,et al.  On divergences, surrogate loss functions, and decentralized detection , 2005, ArXiv.

[18]  Qing Wang,et al.  Divergence estimation of continuous distributions based on data-dependent partitions , 2005, IEEE Transactions on Information Theory.

[19]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[20]  Martin J. Wainwright,et al.  Nonparametric estimation of the likelihood ratio and divergence functionals , 2007, 2007 IEEE International Symposium on Information Theory.

[21]  Michel Broniatowski,et al.  Parametric estimation and tests through divergences and the duality technique , 2008, J. Multivar. Anal..