Setting decision thresholds when operating conditions are uncertain

The quality of the decisions made by a machine learning model depends on the data and the operating conditions during deployment. Often, operating conditions such as class distribution and misclassification costs have changed during the time since the model was trained and evaluated. When deploying a binary classifier that outputs scores, once we know the new class distribution and the new cost ratio between false positives and false negatives, there are several methods in the literature to help us choose an appropriate threshold for the classifier’s scores. However, on many occasions, the information that we have about this operating condition is uncertain. Previous work has considered ranges or distributions of operating conditions during deployment, with expected costs being calculated for ranges or intervals, but still the decision for each point is made as if the operating condition were certain. The implications of this assumption have received limited attention: a threshold choice that is best suited without uncertainty may be suboptimal under uncertainty. In this paper we analyse the effect of operating condition uncertainty on the expected loss for different threshold choice methods, both theoretically and experimentally. We model uncertainty as a second conditional distribution over the actual operation condition and study it theoretically in such a way that minimum and maximum uncertainty are both seen as special cases of this general formulation. This is complemented by a thorough experimental analysis investigating how different learning algorithms behave for a range of datasets according to the threshold choice method and the uncertainty level.

[1]  Tom Fawcett,et al.  "In vivo" spam filtering: a challenge problem for KDD , 2003, SKDD.

[2]  Reynold Cheng,et al.  Naive Bayes Classification of Uncertain Data , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[3]  Bianca Zadrozny,et al.  Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers , 2001, ICML.

[4]  Bianca Zadrozny,et al.  Learning and making decisions when costs and probabilities are both unknown , 2001, KDD '01.

[5]  Wei-Zhi Wu,et al.  Decision-theoretic rough set: A multicost strategy , 2016, Knowl. Based Syst..

[6]  Biao Qin,et al.  DTU: A Decision Tree for Uncertain Data , 2009, PAKDD.

[7]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[8]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[9]  Farida Ridzuan,et al.  Factors Involved in Estimating Cost of Email Spam , 2010, ICCSA.

[10]  Peter A. Flach,et al.  A Simple Lexicographic Ranker and Probability Estimator , 2007, ECML.

[11]  Peter A. Flach,et al.  Brier Curves: a New Cost-Based Visualisation of Classifier Performance , 2011, ICML.

[12]  Peter A. Flach,et al.  ROC curves in cost space , 2013, Machine Learning.

[13]  Carlos Eduardo Castor de Melo,et al.  Cost-Sensitive Measures of Algorithm Similarity for Meta-learning , 2014, 2014 Brazilian Conference on Intelligent Systems.

[14]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.

[15]  Kurt Hornik,et al.  Open-source machine learning: R meets Weka , 2009, Comput. Stat..

[16]  Walmir M. Caminhas,et al.  A review of machine learning approaches to Spam filtering , 2009, Expert Syst. Appl..

[17]  Peter A. Flach,et al.  A Unified View of Performance Metrics: Translating Threshold Choice into Expected Classification Loss C` Esar Ferri , 2012 .

[18]  Niall M. Adams,et al.  Comparing classifiers when the misallocation costs are uncertain , 1999, Pattern Recognit..

[19]  David J. Hand,et al.  Measuring classifier performance: a coherent alternative to the area under the ROC curve , 2009, Machine Learning.

[20]  Christopher M. Bishop Embracing Uncertainty: Applied Machine Learning Comes of Age , 2011, ECML/PKDD.

[21]  José Hernández-Orallo,et al.  On the effect of calibration in classifier combination , 2013, Applied Intelligence.

[22]  Sau Dan Lee,et al.  Decision Trees for Uncertain Data , 2011, IEEE Transactions on Knowledge and Data Engineering.

[23]  Lori A. Dalton Optimal ROC-Based Classification and Performance Analysis under Bayesian Uncertainty Models , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[24]  Yuwen Huang Dynamic Cost-sensitive Naive Bayes Classification for Uncertain Data , 2015 .

[25]  Nitesh V. Chawla,et al.  Optimizing Classifiers for Hypothetical Scenarios , 2015, PAKDD.

[26]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[27]  Robert C. Holte,et al.  Cost curves: An improved method for visualizing classifier performance , 2006, Machine Learning.

[28]  Robert C. Holte,et al.  Explicitly representing expected cost: an alternative to ROC representation , 2000, KDD '00.

[29]  Yong Wang,et al.  Cost-Sensitive Decision Tree for Uncertain Data , 2011, ADMA.

[30]  Tom Fawcett,et al.  PAV and the ROC convex hull , 2007, Machine Learning.

[31]  Zhi-Hua Zhou,et al.  Learning with cost intervals , 2010, KDD '10.

[32]  Georgios Paliouras,et al.  A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists , 2004, Information Retrieval.

[33]  Peter A. Flach,et al.  A Coherent Interpretation of AUC as a Measure of Aggregated Classification Performance , 2011, ICML.

[34]  Tom Fawcett,et al.  Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions , 1997, KDD.

[35]  Jennifer Jie Xu,et al.  Knowledge Discovery and Data Mining , 2014, Computing Handbook, 3rd ed..