Quantifying Learning Guarantees for Convex but Inconsistent Surrogates

We study consistency properties of machine learning methods based on minimizing convex surrogates. We extend the recent framework of Osokin et al. (2017) for the quantitative analysis of consistency properties to the case of inconsistent surrogates. Our key technical contribution consists in a new lower bound on the calibration function for the quadratic surrogate, which is non-trivial (not always zero) for inconsistent cases. The new bound allows to quantify the level of inconsistency of the setting and shows how learning with inconsistent surrogates can have guarantees on sample complexity and optimization difficulty. We apply our theory to two concrete cases: multi-class classification with the tree-structured loss and ranking with the mean average precision loss. The results show the approximation-computation trade-offs caused by inconsistent surrogates and their potential benefits.

[1]  Ambuj Tewari,et al.  On the Consistency of Multiclass Classification Methods , 2007, J. Mach. Learn. Res..

[2]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[3]  Tong Zhang,et al.  Statistical Analysis of Some Multi-Category Large Margin Classification Methods , 2004, J. Mach. Learn. Res..

[4]  Shivani Agarwal,et al.  Convex Calibration Dimension for Multiclass Loss Matrices , 2014, J. Mach. Learn. Res..

[5]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[6]  Ingo Steinwart How to Compare Different Loss Functions and Their Risks , 2007 .

[7]  Francis R. Bach,et al.  On the Consistency of Ordinal Regression Methods , 2014, J. Mach. Learn. Res..

[8]  Francis R. Bach,et al.  On Structured Prediction Theory with Calibrated Convex Surrogate Losses , 2017, NIPS.

[9]  Lorenzo Rosasco,et al.  A Consistent Regularization Approach for Structured Prediction , 2016, NIPS.

[10]  Patrick Gallinari,et al.  Learning Scoring Functions with Order-Preserving Losses and Standardized Supervision , 2011, ICML.

[11]  Philip M. Long,et al.  Consistency versus Realizable H-Consistency for Multiclass Classification , 2013, ICML.

[12]  Patrick Gallinari,et al.  "On the (Non-)existence of Convex, Calibrated Surrogate Losses for Ranking" , 2012, NIPS.

[13]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[14]  W. Dorn Duality in Quadratic Programming... , 2011 .

[15]  A. Choromańska Extreme Multi Class Classification , 2013 .

[16]  Csaba Szepesvári,et al.  Cost-sensitive Multiclass Classification Risk Bounds , 2013, ICML.

[17]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[18]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[19]  Michael I. Jordan,et al.  On the Consistency of Ranking Algorithms , 2010, ICML.

[20]  Ambuj Tewari,et al.  Convex Calibrated Surrogates for Low-Rank Loss Matrices with Applications to Subset Ranking Losses , 2013, NIPS.

[21]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[22]  Alessandro Rudi,et al.  Exponential convergence of testing error for stochastic gradient methods , 2017, COLT.

[23]  Nathan Srebro,et al.  Minimizing The Misclassification Error Rate Using a Surrogate Convex Loss , 2012, ICML.

[24]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..