GEV-Canonical Regression for Accurate Binary Class Probability Estimation when One Class is Rare

We consider the problem of binary class probability estimation (CPE) when one class is rare compared to the other. It is well known that standard algorithms such as logistic regression do not perform well in this setting as they tend to underestimate the probability of the rare class. Common fixes include under-sampling and weighting, together with various correction schemes. Recently, Wang & Dey (2010) suggested the use of a parametrized family of asymmetric link functions based on the generalized extreme value (GEV) distribution, which has been used for modeling rare events in statistics. The approach showed promising initial results, but combined with the logarithmic CPE loss implicitly used in their work, it results in a non-convex composite loss that is difficult to optimize. In this paper, we use tools from the theory of proper composite losses (Buja et al., 2005; Reid & Williamson, 2010) to construct a canonical underlying CPE loss corresponding to the GEV link, which yields a convex proper composite loss that we call the GEV-canonical loss; this loss can be tailored to CPE settings where one class is rare, and is easily minimized using an IRLS-type algorithm similar to that used for logistic regression. Our experiments on both synthetic and real data suggest that the resulting algorithm - which we term GEV-canonical regression - performs well compared to common approaches such as undersampling and weights-correction for this problem.

[1]  L. J. Savage Elicitation of Personal Probabilities and Expectations , 1971 .

[2]  Gary King,et al.  Logistic Regression in Rare Events Data , 2001, Political Analysis.

[3]  Nuno Vasconcelos,et al.  Risk minimization, probability elicitation, and cost-sensitive SVMs , 2010, ICML.

[4]  P. Green Iteratively reweighted least squares for maximum likelihood estimation , 1984 .

[5]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[6]  D. Lindley Savage, Leonard J , 2006 .

[7]  Taghi M. Khoshgoftaar,et al.  Experimental perspectives on learning from imbalanced data , 2007, ICML '07.

[8]  Nathalie Japkowicz,et al.  The Class Imbalance Problem: Significance and Strategies , 2000 .

[9]  G. Grudic,et al.  Loss Functions for Binary Class Probability Estimation , 2003 .

[10]  Mark D. Reid,et al.  Composite Binary Losses , 2009, J. Mach. Learn. Res..

[11]  Wentong Li,et al.  Estimating conversion rate in display advertising from past erformance data , 2012, KDD.

[12]  John Langford,et al.  Cost-sensitive learning by cost-proportionate example weighting , 2003, Third IEEE International Conference on Data Mining.

[13]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[14]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[15]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[16]  Mark D. Reid,et al.  Surrogate regret bounds for proper losses , 2009, ICML '09.

[17]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[18]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[19]  Claudia Czado,et al.  The effect of link misspecification on binary regression inference , 1992 .

[20]  Jesús Cid-Sueiro,et al.  Local estimation of posterior class probabilities to minimize classification errors , 2004, IEEE Transactions on Neural Networks.

[21]  Peter A. Flach,et al.  A Unified View of Performance Metrics: Translating Threshold Choice into Expected Classification Loss C` Esar Ferri , 2012 .

[22]  Silvia Angela Osmetti,et al.  Generalized Extreme Value Regression for Binary Rare Events Data: an Application to Credit Defaults , 2011 .

[23]  Shivani Agarwal Surrogate Regret Bounds for the Area Under the ROC Curve via Strongly Proper Losses , 2013, COLT.

[24]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[25]  Robert C. Holte,et al.  C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling beats Over-Sampling , 2003 .

[26]  Dipak K. Dey,et al.  Generalized extreme value regression for binary response data: An application to B2B electronic payments system adoption , 2011, 1101.1373.

[27]  A. Buja,et al.  Loss Functions for Binary Class Probability Estimation and Classification: Structure and Applications , 2005 .

[28]  J. Corcoran Modelling Extremal Events for Insurance and Finance , 2002 .

[29]  Byron C. Wallace,et al.  Class Probability Estimates are Unreliable for Imbalanced Data (and How to Fix Them) , 2012, 2012 IEEE 12th International Conference on Data Mining.

[30]  S. Nadarajah,et al.  Extreme Value Distributions: Theory and Applications , 2000 .

[31]  M. Schervish A General Method for Comparing Probability Assessors , 1989 .

[32]  A. Hendrickson,et al.  Proper Scores for Probability Forecasters , 1971 .

[33]  Foster Provost,et al.  Machine Learning from Imbalanced Data Sets 101 , 2008 .