Consistent Robust Adversarial Prediction for General Multiclass Classification

We propose a robust adversarial prediction framework for general multiclass classification. Our method seeks predictive distributions that robustly optimize non-convex and non-continuous multiclass loss metrics against the worst-case conditional label distributions (the adversarial distributions) that (approximately) match the statistics of the training data. Although the optimized loss metrics are non-convex and non-continuous, the dual formulation of the framework is a convex optimization problem that can be recast as a risk minimization model with a prescribed convex surrogate loss we call the adversarial surrogate loss. We show that the adversarial surrogate losses fill an existing gap in surrogate loss construction for general multiclass classification problems, by simultaneously aligning better with the original multiclass loss, guaranteeing Fisher consistency, enabling a way to incorporate rich feature spaces via the kernel trick, and providing competitive performance in practice.

[1]  Ling Li,et al.  Large-Margin Thresholded Ensembles for Ordinal Regression: Theory and Practice , 2006, ALT.

[2]  Brian D. Ziebart,et al.  Adversarial Cost-Sensitive Classification , 2015, UAI.

[3]  Amnon Shashua,et al.  Ranking with Large Margin Principle: Two Approaches , 2002, NIPS.

[4]  George B. Dantzig,et al.  Linear programming and extensions , 1965 .

[5]  Xinhua Zhang,et al.  Efficient and Consistent Adversarial Bipartite Matching , 2018, ICML.

[6]  Francis R. Bach,et al.  On the Consistency of Ordinal Regression Methods , 2014, J. Mach. Learn. Res..

[7]  Ling Li,et al.  Ordinal Regression by Extended Binary Classification , 2006, NIPS.

[8]  Motoaki Kawanabe,et al.  On Taxonomies for Multi-class Image Categorization , 2012, International Journal of Computer Vision.

[9]  Hsuan-Tien Lin Reduction from Cost-Sensitive Multiclass Classification to One-versus-One Binary Classification , 2014, ACML.

[10]  Andrea Esuli,et al.  Evaluation Measures for Ordinal Regression , 2009, 2009 Ninth International Conference on Intelligent Systems Design and Applications.

[11]  Yves Grandvalet,et al.  Support Vector Machines with a Reject Option , 2008, NIPS.

[12]  Nai-Yang Deng,et al.  Support Vector Machines: Optimization Based Theory, Algorithms, and Extensions , 2012 .

[13]  J. Neumann,et al.  Theory of games and economic behavior , 1945, 100 Years of Math Milestones.

[14]  Yinyu Ye,et al.  Distributionally Robust Optimization Under Moment Uncertainty with Application to Data-Driven Problems , 2010, Oper. Res..

[15]  Christian Igel,et al.  A Unified View on Multi-class Support Vector Classification , 2016, J. Mach. Learn. Res..

[16]  Vladimir Vapnik,et al.  Principles of Risk Minimization for Learning Theory , 1991, NIPS.

[17]  Jason D. M. Rennie,et al.  Loss Functions for Preference Levels: Regression with Discrete Ordered Labels , 2005 .

[18]  Wei Chu,et al.  Gaussian Processes for Ordinal Regression , 2005, J. Mach. Learn. Res..

[19]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[20]  Mehryar Mohri,et al.  Boosting with Abstention , 2016, NIPS.

[21]  Hans Ulrich Simon,et al.  Robust Trainability of Single Neurons , 1995, J. Comput. Syst. Sci..

[22]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[23]  Wei Chu,et al.  New approaches to support vector ordinal regression , 2005, ICML.

[24]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[25]  Peter L. Bartlett,et al.  Classification with a Reject Option using a Hinge Loss , 2008, J. Mach. Learn. Res..

[26]  Ambuj Tewari,et al.  Consistent algorithms for multiclass classification with an abstain option , 2018 .

[27]  Hsuan-Tien Lin,et al.  One-sided Support Vector Regression for Multiclass Cost-sensitive Classification , 2010, ICML.

[28]  Ambuj Tewari,et al.  On the Consistency of Multiclass Classification Methods , 2007, J. Mach. Learn. Res..

[29]  R. Tyrrell Rockafellar,et al.  Convex Analysis , 1970, Princeton Landmarks in Mathematics and Physics.

[30]  Shivani Agarwal,et al.  Classification Calibration Dimension for General Multiclass Losses , 2012, NIPS.

[31]  D. Bertsekas Control of uncertain systems with a set-membership description of the uncertainty , 1971 .

[32]  A. Dawid,et al.  Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory , 2004, math/0410076.

[33]  Yi Lin,et al.  Support Vector Machines and the Bayes Rule in Classification , 2002, Data Mining and Knowledge Discovery.

[34]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[35]  Brian D. Ziebart,et al.  Adversarial Surrogate Losses for Ordinal Regression , 2017, NIPS.

[36]  Brian D. Ziebart,et al.  Adversarial Multiclass Classification: A Risk Minimization Perspective , 2016, NIPS.

[37]  Hsuan-Tien Lin,et al.  From ordinal ranking to binary classification , 2008 .

[38]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[39]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[40]  Yi Lin Multicategory Support Vector Machines, Theory, and Application to the Classification of . . . , 2003 .

[41]  Jason Weston,et al.  Support vector machines for multi-class pattern recognition , 1999, ESANN.

[42]  Yufeng Liu,et al.  Fisher Consistency of Multicategory Support Vector Machines , 2007, AISTATS.

[43]  M. Sion On general minimax theorems , 1958 .

[44]  John A. Nelder,et al.  Generalized Linear Models , 1989 .

[45]  Narendra Karmarkar,et al.  A new polynomial-time algorithm for linear programming , 1984, STOC '84.

[46]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[47]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[48]  Shivani Agarwal,et al.  Convex Calibration Dimension for Multiclass Loss Matrices , 2014, J. Mach. Learn. Res..