论文信息 - ON surrogate loss functions and f-divergences - 字舞流文

ON surrogate loss functions and f-divergences

The goal of binary classification is to estimate a discriminant function y from observations of covariate vectors and corresponding binary labels. We consider an elaboration of this problem in which the covariates are not available directly but are transformed by a dimensionality-reducing quantizer Q. We present conditions on loss functions such that empirical risk minimization yields Bayes consistency when both the discriminant function and the quantizer are estimated. These conditions are stated in terms of a general correspondence between loss functions and a class of functionals known as Ali-Silvey or /-divergence functionals. Whereas this correspondence was established by Blackwell [Proc. 2nd Berkeley Symp. Probab. Statist. 1 (1951) 93-102. Univ. California Press, Berkeley] for the 0-1 loss, we extend the correspondence to the broader class of surrogate loss functions that play a key role in the general theory of Bayes consistency for binary classification. Our result makes it possible to pick out the (strict) subset of surrogate loss functions that yield Bayes consistency for joint estimation of the discriminant function and the quantizer.

Martin J. Wainwright | Michael I. Jordan | XuanLong Nguyen | M. Wainwright | X. Nguyen

[1] N. Aronszajn. Theory of Reproducing Kernels. , 1950 .

[2] D. Blackwell. Comparison of Experiments , 1951 .

[3] D. Blackwell. Equivalent Comparisons of Experiments , 1953 .

[4] R. N. Bradt. On the Design and Comparison of Certain Dichotomous Experiments , 1954 .

[5] S. M. Ali,et al. A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[6] T. Kailath. The Divergence and Bhattacharyya Distance Measures in Signal Selection , 1967 .

[7] D. Varberg. Convex Functions , 1973 .

[8] H. V. Poor,et al. Applications of Ali-Silvey Distance Measures in the Design of Generalized Quantizers for Binary Decision Systems , 1977, IEEE Trans. Commun..

[9] 丸山徹. Convex Analysisの二,三の進展について , 1977 .

[10] Colin McDiarmid,et al. Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[11] R. Phelps. Convex Functions, Monotone Operators and Differentiability , 1989 .

[12] Maurizio Longo,et al. Quantization for decentralized hypothesis testing under communication constraints , 1990, IEEE Trans. Inf. Theory.

[13] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[14] J. Tsitsiklis. Decentralized Detection' , 1993 .

[15] John N. Tsitsiklis,et al. Extremal properties of likelihood-ratio quantizers , 1993, IEEE Trans. Commun..

[16] Jon A. Wellner,et al. Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[17] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[18] Rick S. Blum,et al. Distributed detection with multiple sensors I. Advanced topics , 1997, Proc. IEEE.

[19] Alexander J. Smola,et al. Learning with kernels , 1998 .

[20] Flemming Topsøe,et al. Some inequalities for information divergence and related measures of discrimination , 2000, IEEE Trans. Inf. Theory.

[21] J. Friedman. Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[22] Venugopal V. Veeravalli,et al. Decentralized detection in sensor networks , 2003, IEEE Trans. Signal Process..

[23] Wenxin Jiang. Process consistency for AdaBoost , 2003 .

[24] Tong Zhang. Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[25] Shie Mannor,et al. Greedy Algorithms for Classification -- Consistency, Convergence Rates, and Adaptivity , 2003, J. Mach. Learn. Res..

[26] G. Lugosi,et al. On the Bayes-risk consistency of regularized boosting methods , 2003 .

[27] Chee-Yee Chong,et al. Sensor networks: evolution, opportunities, and challenges , 2003, Proc. IEEE.

[28] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.

[29] Marion Kee,et al. Analysis , 2004, Machine Translation.

[30] Michael I. Jordan,et al. Nonparametric decentralized detection using kernel methods , 2005, IEEE Transactions on Signal Processing.

[31] Ingo Steinwart,et al. Consistency of support vector machines and other regularized kernel classifiers , 2005, IEEE Transactions on Information Theory.

[32] Michael I. Jordan,et al. Convexity, Classification, and Risk Bounds , 2006 .