Convex Calibration Dimension for Multiclass Loss Matrices

We study consistency properties of surrogate loss functions for general multiclass learning problems, defined by a general multiclass loss matrix. We extend the notion of classification calibration, which has been studied for binary and multiclass 0-1 classification problems (and for certain other specific learning problems), to the general multiclass setting, and derive necessary and sufficient conditions for a surrogate loss to be calibrated with respect to a loss matrix in this setting. We then introduce the notion of convex calibration dimension of a multiclass loss matrix, which measures the smallest 'size' of a prediction space in which it is possible to design a convex surrogate that is calibrated with respect to the loss matrix. We derive both upper and lower bounds on this quantity, and use these results to analyze various loss matrices. In particular, we apply our framework to study various subset ranking losses, and use the convex calibration dimension as a tool to show both the existence and non-existence of various types of convex calibrated surrogates for these losses. Our results strengthen recent results of Duchi et al. (2010) and Calauzenes et al. (2012) on the non-existence of certain types of convex calibrated surrogates in subset ranking. We anticipate the convex calibration dimension may prove to be a useful tool in the study and design of surrogate losses for general multiclass learning problems.

[1]  Ingo Steinwart,et al.  Consistency of support vector machines and other regularized kernel classifiers , 2005, IEEE Transactions on Information Theory.

[2]  Patrick Gallinari,et al.  Learning Scoring Functions with Order-Preserving Losses and Standardized Supervision , 2011, ICML.

[3]  Yi Lin Multicategory Support Vector Machines, Theory, and Application to the Classification of . . . , 2003 .

[4]  Michael I. Jordan,et al.  On the Consistency of Ranking Algorithms , 2010, ICML.

[5]  Jason Weston,et al.  Support vector machines for multi-class pattern recognition , 1999, ESANN.

[6]  Maya R. Gupta,et al.  Cost-sensitive multi-class classification from probability estimates , 2008, ICML '08.

[7]  G. Lugosi,et al.  Ranking and empirical minimization of U-statistics , 2006, math/0603123.

[8]  Eyke Hüllermeier,et al.  Optimizing the F-Measure in Multi-Label Classification: Plug-in Rule Approach versus Structured Loss Minimization , 2013, ICML.

[9]  Jaana Kekäläinen,et al.  IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR Forum.

[10]  Csaba Szepesvári,et al.  Cost-sensitive Multiclass Classification Risk Bounds , 2013, ICML.

[11]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[12]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[13]  Tong Zhang,et al.  Statistical Analysis of Bayes Optimal Subset Ranking , 2008, IEEE Transactions on Information Theory.

[14]  Patrick Gallinari,et al.  "On the (Non-)existence of Convex, Calibrated Surrogate Losses for Ranking" , 2012, NIPS.

[15]  Wenxin Jiang Process consistency for AdaBoost , 2003 .

[16]  Dimitri P. Bertsekas,et al.  Convex Analysis and Optimization , 2003 .

[17]  J. Gallier Notes on Convex Sets, Polytopes, Polyhedra, Combinatorial Topology, Voronoi Diagrams and Delaunay Triangulations , 2008, 0805.0292.

[18]  G. Wahba,et al.  Multicategory Support Vector Machines , Theory , and Application to the Classification of Microarray Data and Satellite Radiance Data , 2004 .

[19]  C. Scott Calibrated asymmetric surrogate losses , 2012 .

[20]  G. Lugosi,et al.  On the Bayes-risk consistency of regularized boosting methods , 2003 .

[21]  Zhi-Hua Zhou,et al.  On the Consistency of Multi-Label Learning , 2011, COLT.

[22]  Eyke Hüllermeier,et al.  Bipartite Ranking through Minimization of Univariate Loss , 2011, ICML.

[23]  Ingo Steinwart How to Compare Different Loss Functions and Their Risks , 2007 .

[24]  Mark D. Reid,et al.  Composite Binary Losses , 2009, J. Mach. Learn. Res..

[25]  Y. Freund,et al.  A Discussion of: "Process Consistency for AdaBoost" by Wenxin Jiang "On the Bayes-risk consistency of regularized boosting methods" by G´ abor Lugosi and Nicolas Vayatis "Statistical Behavior and Consistency of Classification Methods based on Convex Risk Minimization" by Tong Zhang , 2004 .

[26]  Ambuj Tewari,et al.  On the Consistency of Multiclass Classification Methods , 2007, J. Mach. Learn. Res..

[27]  Pradeep Ravikumar,et al.  On NDCG Consistency of Listwise Ranking Methods , 2011, AISTATS.

[28]  Ambuj Tewari,et al.  Convex Calibrated Surrogates for Low-Rank Loss Matrices with Applications to Subset Ranking Losses , 2013, NIPS.

[29]  Shivani Agarwal,et al.  Classification Calibration Dimension for General Multiclass Losses , 2012, NIPS.

[30]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[31]  Ambuj Tewari,et al.  Consistent Algorithms for Multiclass Classification with a Reject Option , 2015, ArXiv.

[32]  Shivani Agarwal,et al.  Surrogate regret bounds for bipartite ranking via strongly proper losses , 2012, J. Mach. Learn. Res..

[33]  Tie-Yan Liu,et al.  Listwise approach to learning to rank: theory and algorithm , 2008, ICML '08.

[34]  Tong Zhang,et al.  Statistical Analysis of Some Multi-Category Large Margin Classification Methods , 2004, J. Mach. Learn. Res..

[35]  Mark D. Reid,et al.  Composite Multiclass Losses , 2011, J. Mach. Learn. Res..

[36]  Yoav Shoham,et al.  Eliciting truthful answers to multiple-choice questions , 2009, EC '09.

[37]  Stéphan Clémençon,et al.  Ranking the Best Instances , 2006, J. Mach. Learn. Res..

[38]  Ming Yuan,et al.  Classification Methods with Reject Option Based on Convex Risk Minimization , 2010, J. Mach. Learn. Res..