论文信息 - Multiclass Classification Calibration Functions

Multiclass Classification Calibration Functions

In this paper we refine the process of computing calibration functions for a number of multiclass classification surrogate losses. Calibration functions are a powerful tool for easily converting bounds for the surrogate risk (which can be computed through well-known methods) into bounds for the true risk, the probability of making a mistake. They are particularly suitable in non-parametric settings, where the approximation error can be controlled, and provide tighter bounds than the common technique of upper-bounding the 0-1 loss by the surrogate loss. The abstract nature of the more sophisticated existing calibration function results requires calibration functions to be explicitly derived on a case-by-case basis, requiring repeated efforts whenever bounds for a new surrogate loss are required. We devise a streamlined analysis that simplifies the process of deriving calibration functions for a large number of surrogate losses that have been proposed in the literature. The effort of deriving calibration functions is then surmised in verifying, for a chosen surrogate loss, a small number of conditions that we introduce. As case studies, we recover existing calibration functions for the well-known loss of Lee et al. (2004), and also provide novel calibration functions for well-known losses, including the one-versus-all loss and the logistic regression loss, plus a number of other losses that have been shown to be classification-calibrated in the past, but for which no calibration function had been derived.

Csaba Szepesvári | Bernardo Ávila Pires | Csaba Szepesvari | B. '. Pires

[1] Hans Ulrich Simon,et al. Robust Trainability of Single Neurons , 1995, J. Comput. Syst. Sci..

[2] Philip M. Long,et al. Consistency versus Realizable H-Consistency for Multiclass Classification , 2013, ICML.

[3] Yi Lin. A note on margin-based loss functions in classification , 2004 .

[4] Peter L. Bartlett,et al. Boosting Algorithms as Gradient Descent , 1999, NIPS.

[5] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[6] Yufeng Liu,et al. Fisher Consistency of Multicategory Support Vector Machines , 2007, AISTATS.

[7] Scott Sanner,et al. Algorithms for Direct 0-1 Loss Optimization in Binary Classification , 2013, ICML.

[8] Jason Weston,et al. Multi-Class Support Vector Machines , 1998 .

[9] Patrick Gallinari,et al. Calibration and regret bounds for order-preserving surrogate losses in learning to rank , 2013, Machine Learning.

[10] Shai Ben-David,et al. On the difficulty of approximately maximizing agreements , 2000, J. Comput. Syst. Sci..

[11] Ambuj Tewari,et al. Convex Calibrated Surrogates for Low-Rank Loss Matrices with Applications to Subset Ranking Losses , 2013, NIPS.

[12] Frank Nielsen,et al. Bregman Divergences and Surrogates for Learning , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13] Ingo Steinwart. How to Compare Different Loss Functions and Their Risks , 2007 .

[14] Christian Igel,et al. A Unified View on Multi-class Support Vector Classification , 2016, J. Mach. Learn. Res..

[15] David J. Kriegman,et al. Guess-Averse Loss Functions For Cost-Sensitive Multiclass Boosting , 2014, ICML.

[16] H. Zou. The Margin Vector , Admissible Loss and Multi-class Margin-based Classifiers , 2005 .

[17] Tao Sun,et al. Consistency of Multiclass Empirical Risk Minimization Methods Based on Convex Loss , 2006, J. Mach. Learn. Res..

[18] S. Boucheron,et al. Theory of classification : a survey of some recent advances , 2005 .

[19] A. Raftery,et al. Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[20] Yi Lin. Multicategory Support Vector Machines, Theory, and Application to the Classification of . . . , 2003 .

[21] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[22] Andreas Christmann,et al. Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[23] Zhenhua Wang,et al. A Hybrid Loss for Multiclass and Structured Prediction , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24] Tong Zhang,et al. Statistical Analysis of Some Multi-Category Large Margin Classification Methods , 2004, J. Mach. Learn. Res..

[25] Koby Crammer,et al. Ultraconservative Online Algorithms for Multiclass Problems , 2001, J. Mach. Learn. Res..

[26] Prasad Raghavendra,et al. Agnostic Learning of Monomials by Halfspaces Is Hard , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[27] Shivani Agarwal,et al. Convex Calibration Dimension for Multiclass Loss Matrices , 2014, J. Mach. Learn. Res..

[28] V. Koltchinskii,et al. Oracle inequalities in empirical risk minimization and sparse recovery problems , 2011 .

[29] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[30] Mark D. Reid,et al. Surrogate regret bounds for proper losses , 2009, ICML '09.

[31] Mark D. Reid,et al. Composite Binary Losses , 2009, J. Mach. Learn. Res..

[32] Ambuj Tewari,et al. On the Consistency of Multiclass Classification Methods , 2007, J. Mach. Learn. Res..

[33] Shivani Agarwal,et al. Classification Calibration Dimension for General Multiclass Losses , 2012, NIPS.

[34] Csaba Szepesvári,et al. Cost-sensitive Multiclass Classification Risk Bounds , 2013, ICML.

[35] Michael I. Jordan,et al. Convexity, Classification, and Risk Bounds , 2006 .

[36] Shai Ben-David,et al. Understanding Machine Learning: From Theory to Algorithms , 2014 .

[37] Lorenzo Rosasco,et al. Multiclass Learning with Simplex Coding , 2012, NIPS.

[38] Ryan M. Rifkin,et al. In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[39] E. Mammen,et al. Smooth Discrimination Analysis , 1999 .

[40] Klaus Nordhausen,et al. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition by Trevor Hastie, Robert Tibshirani, Jerome Friedman , 2009 .