The Convexity and Design of Composite Multiclass Losses

We consider composite loss functions for multiclass prediction comprising a proper (i.e., Fisher-consistent) loss over probability distributions and an inverse link function. We establish conditions for their (strong) convexity and explore the implications. We also show how the separation of concerns afforded by using this composite representation allows for the design of families of losses with the same Bayes risk.

[1]  T. Nayak,et al.  Estimating multinomial cell probabilities under quadratic loss , 1989 .

[2]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[3]  Mark D. Reid,et al.  Information, Divergence and Risk for Binary Experiments , 2009, J. Mach. Learn. Res..

[4]  Mark D. Reid,et al.  Mixability is Bayes Risk Curvature Relative to Log Loss , 2011, COLT.

[5]  H. Zou,et al.  NEW MULTICATEGORY BOOSTING ALGORITHMS BASED ON MULTICATEGORY FISHER-CONSISTENT LOSSES. , 2008, The annals of applied statistics.

[6]  Tong Zhang,et al.  Statistical Analysis of Some Multi-Category Large Margin Classification Methods , 2004, J. Mach. Learn. Res..

[7]  A. Buja,et al.  Loss Functions for Binary Class Probability Estimation and Classification: Structure and Applications , 2005 .

[8]  service Topic collections Notes , .

[9]  J. Magnus,et al.  Matrix Differential Calculus with Applications in Statistics and Econometrics , 2019, Wiley Series in Probability and Statistics.

[10]  Mark D. Reid,et al.  Composite Binary Losses , 2009, J. Mach. Learn. Res..

[11]  Yufeng Liu,et al.  Fisher Consistency of Multicategory Support Vector Machines , 2007, AISTATS.

[12]  Ambuj Tewari,et al.  On the Consistency of Multiclass Classification Methods , 2007, J. Mach. Learn. Res..

[13]  Charles R. Johnson,et al.  Topics in Matrix Analysis , 1991 .

[14]  J. Hiriart-Urruty,et al.  Fundamentals of Convex Analysis , 2004 .

[15]  L. Brown,et al.  Admissibility and complete class results for the multinomial estimation problem with entropy and squared error loss , 1982 .

[16]  Trevor Hastie,et al.  Multi-class AdaBoost ∗ , 2009 .

[17]  Arnaud Doucet,et al.  A Framework for Kernel-Based Multi-Category Classification , 2007, J. Artif. Intell. Res..

[18]  K. Lange,et al.  Multicategory vertex discriminant analysis for high-dimensional data , 2010, 1101.0952.

[19]  W. Vetter Derivative operations on matrices , 1970 .

[20]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[21]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[22]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[23]  Robert E. Schapire,et al.  A theory of multiclass boosting , 2010, J. Mach. Learn. Res..

[24]  R. Showalter Monotone operators in Banach space and nonlinear partial differential equations , 1996 .

[25]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[26]  Zhihua Zhang,et al.  Coherence Functions for Multicategory Margin-based Classification Methods , 2009, AISTATS.

[27]  Jack Carl Kiefer,et al.  Lectures on statistical inference , 1986 .

[28]  H. Zou The Margin Vector , Admissible Loss and Multi-class Margin-based Classifiers , 2005 .