论文信息 - Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization - 字舞流文

Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization

We develop and analyze M-estimation methods for divergence functionals and the likelihood ratios of two probability distributions. Our method is based on a nonasymptotic variational characterization of f -divergences, which allows the problem of estimating divergences to be tackled via convex empirical risk optimization. The resulting estimators are simple to implement, requiring only the solution of standard convex programs. We present an analysis of consistency and convergence for these estimators. Given conditions only on the ratios of densities, we show that our estimators can achieve optimal minimax rates for the likelihood ratio and the divergence functionals in certain regimes. We derive an efficient optimization algorithm for computing our estimates, and illustrate their convergence behavior and practical viability by simulations.

Martin J. Wainwright | Michael I. Jordan | XuanLong Nguyen | M. Wainwright | X. Nguyen

[1] G. C. Hood. Estimation of Entropy , 1953 .

[2] S. M. Ali,et al. A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[3] M. Birman,et al. PIECEWISE-POLYNOMIAL APPROXIMATIONS OF FUNCTIONS OF THE CLASSES $ W_{p}^{\alpha}$ , 1967 .

[4] T. Kailath. The Divergence and Bhattacharyya Distance Measures in Signal Selection , 1967 .

[5] 丸山徹. Convex Analysisの二,三の進展について , 1977 .

[6] B. Silverman,et al. On the Estimation of a Probability Density Function by the Maximum Penalized Likelihood Method , 1982 .

[7] I. Ibragimov,et al. On Nonparametric Estimation of the Value of a Linear Functional in Gaussian White Noise , 1985 .

[8] L. Györfi,et al. Density-free convergence properties of various estimators of entropy , 1987 .

[9] Saburou Saitoh,et al. Theory of Reproducing Kernels and Its Applications , 1988 .

[10] H. Joe. Estimation of entropy and other functionals of a multivariate density , 1989 .

[11] H. Joe. Relative Entropy Measures of Multivariate Dependence , 1989 .

[12] D. Donoho,et al. Geometrizing Rates of Convergence, III , 1991 .

[13] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[14] P. Comon. Independent Component Analysis , 1992 .

[15] J. Hiriart-Urruty,et al. Convex analysis and minimization algorithms , 1993 .

[16] J. Tsitsiklis. Decentralized Detection' , 1993 .

[17] P. Hall,et al. On the estimation of entropy , 1993 .

[18] Pierre Comon,et al. Independent component analysis, A new concept? , 1994, Signal Process..

[19] P. Massart,et al. Estimation of Integral Functionals of a Density , 1995 .

[20] B. Laurent. Efficient estimation of integral functionals of a density , 1996 .

[21] G. Kerkyacharian,et al. Estimating nonquadratic functionals of a density using Haar wavelets , 1996 .

[22] Jon A. Wellner,et al. Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[23] Bin Yu. Assouad, Fano, and Le Cam , 1997 .

[24] Alexander J. Smola,et al. Learning with kernels , 1998 .

[25] A. V. D. Vaart,et al. Asymptotic Statistics: Frontmatter , 1998 .

[26] Yuhong Yang,et al. Information-theoretic determination of minimax rates of convergence , 1999 .

[27] Flemming Topsøe,et al. Some inequalities for information divergence and related measures of discrimination , 2000, IEEE Trans. Inf. Theory.

[28] S. Geer. Empirical Processes in M-Estimation , 2000 .

[29] S. R. Jammalamadaka,et al. Empirical Processes in M-Estimation , 2001 .

[30] E. Oja,et al. Independent Component Analysis , 2013 .

[31] Felipe Cucker,et al. On the mathematical foundations of learning , 2001 .

[32] Ding-Xuan Zhou,et al. The covering number in learning theory , 2002, J. Complex..

[33] A. Keziou. Dual representation of Φ-divergences and applications , 2003 .

[34] Nello Cristianini,et al. Kernel Methods for Pattern Analysis , 2003, ICTAI.

[35] Martin J. Wainwright,et al. ON surrogate loss functions and f-divergences , 2005, math/0510521.

[36] Martin J. Wainwright,et al. On divergences, surrogate loss functions, and decentralized detection , 2005, ArXiv.

[37] Qing Wang,et al. Divergence estimation of continuous distributions based on data-dependent partitions , 2005, IEEE Transactions on Information Theory.

[38] Michael I. Jordan,et al. Convexity, Classification, and Risk Bounds , 2006 .

[39] Igor Vajda,et al. On Divergences and Informations in Statistics and Information Theory , 2006, IEEE Transactions on Information Theory.

[40] Sanjeev R. Kulkarni,et al. A Nearest-Neighbor Approach to Estimating Divergence between Continuous Random Vectors , 2006, 2006 IEEE International Symposium on Information Theory.

[41] Martin J. Wainwright,et al. Nonparametric estimation of the likelihood ratio and divergence functionals , 2007, 2007 IEEE International Symposium on Information Theory.

[42] Martin J. Wainwright,et al. Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization , 2007, NIPS.

[43] A. Keziou,et al. On empirical likelihood for semiparametric two-sample density ratio models , 2008 .

[44] D. Donoho,et al. Geometrizing Rates of Convergence , II , 2008 .

[45] Le Song,et al. Relative Novelty Detection , 2009, AISTATS.

[46] Michel Broniatowski,et al. Parametric estimation and tests through divergences and the duality technique , 2008, J. Multivar. Anal..

[47] Seungjin Choi,et al. Independent Component Analysis , 2009, Handbook of Natural Computing.