Guess-Averse Loss Functions For Cost-Sensitive Multiclass Boosting

Cost-sensitive multiclass classification has recently acquired significance in several applications, through the introduction of multi-class datasets with well-defined misclassification costs. The design of classification algorithms for this setting is considered. It is argued that the unreliable performance of current algorithms is due to the inability of the underlying loss functions to enforce a certain fundamental underlying property. This property, denoted guess-aversion, is that the loss should encourage correct classifications over the arbitrary guessing that ensues when all classes are equally scored by the classifier. While guess-aversion holds trivially for binary classification, this is not true in the multiclass setting. A new family of cost-sensitive guess-averse loss functions is derived, and used to design new cost-sensitive multiclass boosting algorithms, denoted GEL- and GLL-MCBoost. Extensive experiments demonstrate (1) the importance of guess-aversion and (2) that the GLL loss function outperforms other loss functions for multiclass boosting.

[1]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[2]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[3]  John Langford,et al.  An iterative method for multi-class cost-sensitive learning , 2004, KDD.

[4]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[5]  Nuno Vasconcelos,et al.  Cost-Sensitive Boosting , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[7]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[8]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[9]  Salvatore J. Stolfo,et al.  AdaCost: Misclassification Cost-Sensitive Boosting , 1999, ICML.

[10]  Junhui Wang Boosting the Generalized Margin in Cost-Sensitive Multiclass Classification , 2013 .

[11]  Chunhua Shen,et al.  A direct formulation for totally-corrective multi-class boosting , 2011, CVPR 2011.

[12]  Yi Lin Multicategory Support Vector Machines, Theory, and Application to the Classification of . . . , 2003 .

[13]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[14]  Steve Branson,et al.  Efficient Large-Scale Structured Learning , 2013, CVPR.

[15]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[16]  Robert E. Schapire,et al.  A theory of multiclass boosting , 2010, J. Mach. Learn. Res..

[17]  Nuno Vasconcelos,et al.  Multiclass Boosting: Theory and Algorithms , 2011, NIPS.

[18]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[19]  Pietro Perona,et al.  The Fastest Pedestrian Detector in the West , 2010, BMVC.

[20]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[21]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[22]  Antonio Torralba,et al.  Sharing features: efficient boosting procedures for multiclass object detection , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[23]  Paul A. Viola,et al.  Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade , 2001, NIPS.

[24]  David J. Kriegman,et al.  Automated annotation of coral reef survey images , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  C. Scott Calibrated asymmetric surrogate losses , 2012 .

[26]  Trevor Hastie,et al.  Multi-class AdaBoost ∗ , 2009 .

[27]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[28]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[29]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[30]  Peter L. Bartlett,et al.  Functional Gradient Techniques for Combining Hypotheses , 2000 .

[31]  Naoki Abe,et al.  Multi-class cost-sensitive boosting with p-norm loss functions , 2008, KDD.

[32]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[33]  Ambuj Tewari,et al.  On the Consistency of Multiclass Classification Methods , 2007, J. Mach. Learn. Res..

[34]  Shivani Agarwal,et al.  Classification Calibration Dimension for General Multiclass Losses , 2012, NIPS.

[35]  Tong Zhang,et al.  Statistical Analysis of Some Multi-Category Large Margin Classification Methods , 2004, J. Mach. Learn. Res..

[36]  Mark D. Reid,et al.  Composite Multiclass Losses , 2011, J. Mach. Learn. Res..