Multiclass Classification Calibration Functions

In this paper we refine the process of computing calibration functions for a number of multiclass classification surrogate losses. Calibration functions are a powerful tool for easily converting bounds for the surrogate risk (which can be computed through well-known methods) into bounds for the true risk, the probability of making a mistake. They are particularly suitable in non-parametric settings, where the approximation error can be controlled, and provide tighter bounds than the common technique of upper-bounding the 0-1 loss by the surrogate loss. The abstract nature of the more sophisticated existing calibration function results requires calibration functions to be explicitly derived on a case-by-case basis, requiring repeated efforts whenever bounds for a new surrogate loss are required. We devise a streamlined analysis that simplifies the process of deriving calibration functions for a large number of surrogate losses that have been proposed in the literature. The effort of deriving calibration functions is then surmised in verifying, for a chosen surrogate loss, a small number of conditions that we introduce. As case studies, we recover existing calibration functions for the well-known loss of Lee et al. (2004), and also provide novel calibration functions for well-known losses, including the one-versus-all loss and the logistic regression loss, plus a number of other losses that have been shown to be classification-calibrated in the past, but for which no calibration function had been derived.

[1]  Hans Ulrich Simon,et al.  Robust Trainability of Single Neurons , 1995, J. Comput. Syst. Sci..

[2]  Philip M. Long,et al.  Consistency versus Realizable H-Consistency for Multiclass Classification , 2013, ICML.

[3]  Yi Lin A note on margin-based loss functions in classification , 2004 .

[4]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent , 1999, NIPS.

[5]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[6]  Yufeng Liu,et al.  Fisher Consistency of Multicategory Support Vector Machines , 2007, AISTATS.

[7]  Scott Sanner,et al.  Algorithms for Direct 0-1 Loss Optimization in Binary Classification , 2013, ICML.

[8]  Jason Weston,et al.  Multi-Class Support Vector Machines , 1998 .

[9]  Patrick Gallinari,et al.  Calibration and regret bounds for order-preserving surrogate losses in learning to rank , 2013, Machine Learning.

[10]  Shai Ben-David,et al.  On the difficulty of approximately maximizing agreements , 2000, J. Comput. Syst. Sci..

[11]  Ambuj Tewari,et al.  Convex Calibrated Surrogates for Low-Rank Loss Matrices with Applications to Subset Ranking Losses , 2013, NIPS.

[12]  Frank Nielsen,et al.  Bregman Divergences and Surrogates for Learning , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Ingo Steinwart How to Compare Different Loss Functions and Their Risks , 2007 .

[14]  Christian Igel,et al.  A Unified View on Multi-class Support Vector Classification , 2016, J. Mach. Learn. Res..

[15]  David J. Kriegman,et al.  Guess-Averse Loss Functions For Cost-Sensitive Multiclass Boosting , 2014, ICML.

[16]  H. Zou The Margin Vector , Admissible Loss and Multi-class Margin-based Classifiers , 2005 .

[17]  Tao Sun,et al.  Consistency of Multiclass Empirical Risk Minimization Methods Based on Convex Loss , 2006, J. Mach. Learn. Res..

[18]  S. Boucheron,et al.  Theory of classification : a survey of some recent advances , 2005 .

[19]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[20]  Yi Lin Multicategory Support Vector Machines, Theory, and Application to the Classification of . . . , 2003 .

[21]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[22]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[23]  Zhenhua Wang,et al.  A Hybrid Loss for Multiclass and Structured Prediction , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Tong Zhang,et al.  Statistical Analysis of Some Multi-Category Large Margin Classification Methods , 2004, J. Mach. Learn. Res..

[25]  Koby Crammer,et al.  Ultraconservative Online Algorithms for Multiclass Problems , 2001, J. Mach. Learn. Res..

[26]  Prasad Raghavendra,et al.  Agnostic Learning of Monomials by Halfspaces Is Hard , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[27]  Shivani Agarwal,et al.  Convex Calibration Dimension for Multiclass Loss Matrices , 2014, J. Mach. Learn. Res..

[28]  V. Koltchinskii,et al.  Oracle inequalities in empirical risk minimization and sparse recovery problems , 2011 .

[29]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[30]  Mark D. Reid,et al.  Surrogate regret bounds for proper losses , 2009, ICML '09.

[31]  Mark D. Reid,et al.  Composite Binary Losses , 2009, J. Mach. Learn. Res..

[32]  Ambuj Tewari,et al.  On the Consistency of Multiclass Classification Methods , 2007, J. Mach. Learn. Res..

[33]  Shivani Agarwal,et al.  Classification Calibration Dimension for General Multiclass Losses , 2012, NIPS.

[34]  Csaba Szepesvári,et al.  Cost-sensitive Multiclass Classification Risk Bounds , 2013, ICML.

[35]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[36]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[37]  Lorenzo Rosasco,et al.  Multiclass Learning with Simplex Coding , 2012, NIPS.

[38]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[39]  E. Mammen,et al.  Smooth Discrimination Analysis , 1999 .

[40]  Klaus Nordhausen,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition by Trevor Hastie, Robert Tibshirani, Jerome Friedman , 2009 .