Maximization of AUC and Buffered AUC in binary classification

In binary classification, performance metrics that are defined as the probability that some error exceeds a threshold are numerically difficult to optimize directly and also hide potentially important information about the magnitude of errors larger than the threshold. Defining similar metrics, instead, using Buffered Probability of Exceedance (bPOE) generates counterpart metrics that resolve both of these issues. We apply this approach to the case of AUC, the Area Under the ROC curve, and define Buffered AUC (bAUC). We show that bAUC can provide insights into classifier performance not revealed by AUC, while being closely related as the tightest concave lower bound and representable as the area under a modified ROC curve. Additionally, while AUC is numerically difficult to optimize directly, we show that bAUC optimization often reduces to convex or linear programming. Extending these results, we show that AUC and bAUC are special cases of Generalized bAUC and that popular Support Vector Machine (SVM) formulations for approximately maximizing AUC are equivalent to direct maximization of Generalized bAUC. We also prove bAUC generalization bounds for these SVM’s. As a central component to these results, we provide an important, novel formula for calculating bPOE, the inverse of Conditional Value-at-Risk. Using this formula, we show that particular bPOE minimization problems reduce to convex and linear programming.

[1]  Bianca Zadrozny,et al.  Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers , 2001, ICML.

[2]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[3]  Bhavani Raskutti,et al.  Optimising area under the ROC curve using gradient descent , 2004, ICML.

[4]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[5]  Mehryar Mohri,et al.  AUC Optimization vs. Error Rate Minimization , 2003, NIPS.

[6]  Shinto Eguchi,et al.  Area under the curve maximization method in credit scoring , 2010 .

[7]  Peter A. Flach,et al.  Modifying ROC Curves to Incorporate Predicted Probabilities , 2005 .

[8]  Ron Kohavi,et al.  The Case against Accuracy Estimation for Comparing Induction Algorithms , 1998, ICML.

[9]  J A Swets,et al.  Measuring the accuracy of diagnostic systems. , 1988, Science.

[10]  Tom M. Mitchell,et al.  Using the Future to Sort Out the Present: Rankprop and Multitask Learning for Medical Risk Evaluation , 1995, NIPS.

[11]  Peter A. Flach,et al.  A Unified View of Performance Metrics: Translating Threshold Choice into Expected Classification Loss C` Esar Ferri , 2012 .

[12]  J A Swets,et al.  Better decisions through science. , 2000, Scientific American.

[13]  Stanislav Uryasev,et al.  Conditional Value-at-Risk for General Loss Distributions , 2002 .

[14]  Gerhard-Wilhelm Weber,et al.  A classification problem of credit risk rating investigated and solved by optimisation of the ROC curve , 2012, Central Eur. J. Oper. Res..

[15]  Peter A. Flach,et al.  Scored AUC Metrics for Classifier Evaluation and Selection , 2005 .

[16]  Ulf Brefeld,et al.  {AUC} maximizing support vector learning , 2005 .

[17]  R. Rockafellar,et al.  Optimization of conditional value-at risk , 2000 .

[18]  Ulf Brefeld,et al.  Co-EM support vector learning , 2004, ICML.

[19]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[20]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[21]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[22]  Stan Uryasev,et al.  Buffered Probability of Exceedance: Mathematical Properties and Optimization , 2018, SIAM J. Optim..

[23]  Bianca Zadrozny,et al.  Transforming classifier scores into accurate multiclass probability estimates , 2002, KDD.

[24]  Johannes O. Royset,et al.  On buffered failure probability in design and optimization of structures , 2010, Reliab. Eng. Syst. Saf..

[25]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[26]  M. Frittelli,et al.  Law invariant convex risk measures , 2005 .

[27]  R. Rockafellar,et al.  Conditional Value-at-Risk for General Loss Distributions , 2001 .

[28]  S. Sathiya Keerthi,et al.  Efficient algorithms for ranking with SVMs , 2010, Information Retrieval.

[29]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[30]  Justin R. Davis,et al.  Analysis of tropical storm damage using buffered probability of exceedance , 2016, Natural Hazards.

[31]  Stan Uryasev,et al.  Soft Margin Support Vector Classification as Buffered Probability Minimization , 2017, J. Mach. Learn. Res..

[32]  Alexander Shapiro,et al.  Estimation and asymptotics for buffered probability of exceedance , 2018, Eur. J. Oper. Res..

[33]  Thomas F. Coleman,et al.  RankRC: Large-Scale Nonlinear Rare Class Ranking , 2015, IEEE Transactions on Knowledge and Data Engineering.

[34]  Tom Fawcett,et al.  Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions , 1997, KDD.

[35]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[36]  Michael C. Mozer,et al.  Optimizing Classifier Performance via an Approximation to the Wilcoxon-Mann-Whitney Statistic , 2003, ICML.

[37]  Yoram Singer,et al.  Learning to Order Things , 1997, NIPS.

[38]  Trevor Darrell,et al.  Learning with Recursive Perceptual Representations , 2012, NIPS.

[39]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .