Binarised regression tasks: methods and evaluation metrics

Some supervised tasks are presented with a numerical output but decisions have to be made in a discrete, binarised, way, according to a particular cutoff. This binarised regression task is a very common situation that requires its own analysis, different from regression and classification—and ordinal regression. We first investigate the application cases in terms of the information about the distribution and range of the cutoffs and distinguish six possible scenarios, some of which are more common than others. Next, we study two basic approaches: the retraining approach, which discretises the training set whenever the cutoff is available and learns a new classifier from it, and the reframing approach, which learns a regression model and sets the cutoff when this is available during deployment. In order to assess the binarised regression task, we introduce context plots featuring error against cutoff. Two special cases are of interest, the $$ UCE $$UCE and $$ OCE $$OCE curves, showing that the area under the former is the mean absolute error and the latter is a new metric that is in between a ranking measure and a residual-based measure. A comprehensive evaluation of the retraining and reframing approaches is performed using a repository of binarised regression problems created on purpose, concluding that no method is clearly better than the other, except when the size of the training data is small.

[1]  José Hernández-Orallo,et al.  Probabilistic Reframing for Cost-Sensitive Regression , 2014, ACM Trans. Knowl. Discov. Data.

[2]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[3]  J. Hernández-Orallo Probabilistic Reframing for Cost-Sensitive Regression , 2014, ACM Trans. Knowl. Discov. Data.

[4]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[5]  R. Koenker Quantile Regression: Fundamentals of Quantile Regression , 2005 .

[6]  Xindong Wu,et al.  Mining in Anticipation for Concept Change: Proactive-Reactive Prediction in Data Streams , 2006, Data Mining and Knowledge Discovery.

[7]  Luís Torgo,et al.  Regression error characteristic surfaces , 2005, KDD '05.

[8]  Craig A. Knoblock,et al.  Discovering Robust Knowledge from Databases that Change , 1998, Data Mining and Knowledge Discovery.

[9]  R. Koenker Quantile Regression: Name Index , 2005 .

[10]  José Hernández-Orallo,et al.  Cautious Classifiers , 2004, ROCAI.

[11]  Kurt Hornik,et al.  Open-source machine learning: R meets Weka , 2009, Comput. Stat..

[12]  Peter A. Flach,et al.  ROC Analysis in Artificial Intelligence, 1st International Workshop, ROCAI-2004, Valencia, Spain, August 22, 2004 , 2004, ROCAI.

[13]  Jinbo Bi,et al.  Regression Error Characteristic Curves , 2003, ICML.

[14]  Peter A. Flach,et al.  A Unified View of Performance Metrics: Translating Threshold Choice into Expected Classification Loss C` Esar Ferri , 2012 .

[15]  Luís Torgo,et al.  Regression by Classification , 1996, SBIA.

[16]  Geoffrey I. Webb,et al.  Encyclopedia of Machine Learning , 2011, Encyclopedia of Machine Learning.

[17]  Robert C. Holte,et al.  Cost curves: An improved method for visualizing classifier performance , 2006, Machine Learning.

[18]  Robert C. Holte,et al.  Explicitly representing expected cost: an alternative to ROC representation , 2000, KDD '00.

[19]  Tom Fawcett,et al.  Adaptive Fraud Detection , 1997, Data Mining and Knowledge Discovery.

[20]  José Hernández-Orallo,et al.  ROC curves for regression , 2013, Pattern Recognit..

[21]  Robert L. Grossman,et al.  KDD-2005 : proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 21-24, 2005, Chicago, Illinois, USA , 2005 .

[22]  José Hernández-Orallo,et al.  An experimental comparison of performance measures for classification , 2009, Pattern Recognit. Lett..

[23]  Bianca Zadrozny,et al.  Ranking-based evaluation of regression models , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[24]  John Langford,et al.  Predicting Conditional Quantiles via Reduction to Classification , 2006, UAI.

[25]  Peter A. Flach The Geometry of ROC Space: Understanding Machine Learning Metrics through ROC Isometrics , 2003, ICML.

[26]  Tadeusz Pietraszek,et al.  On the use of ROC analysis for the optimization of abstaining classifiers , 2007, Machine Learning.

[27]  Moisés Goldszmidt,et al.  Properties and Benefits of Calibrated Classifiers , 2004, PKDD.

[28]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[29]  Gustavo E. A. P. A. Batista,et al.  A Survey on Graphical Methods for Classification Predictive Performance Evaluation , 2011, IEEE Transactions on Knowledge and Data Engineering.

[30]  Dale Schuurmans,et al.  Discriminative Batch Mode Active Learning , 2007, NIPS.

[31]  Tom Fawcett,et al.  ROC graphs with instance-varying costs , 2006, Pattern Recognit. Lett..

[32]  John Langford,et al.  Estimating Class Membership Probabilities using Classifier Learners , 2005, AISTATS.

[33]  José Hernández-Orallo,et al.  Aggregative quantification for regression , 2013, Data Mining and Knowledge Discovery.

[34]  J A Swets,et al.  Better decisions through science. , 2000, Scientific American.

[35]  Gregory Piatetsky-Shapiro,et al.  Estimating campaign benefits and modeling lift , 1999, KDD '99.

[36]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.