The foundations of cost-sensitive causal classification

Classification is a well-studied machine learning task which concerns the assignment of instances to a set of outcomes. Classification models support the optimization of managerial decision-making across a variety of operational business processes. For instance, customer churn prediction models are adopted to increase the efficiency of retention campaigns by optimizing the selection of customers that are to be targeted. Cost-sensitive and causal classification methods have independently been proposed to improve the performance of classification models. The former considers the benefits and costs of correct and incorrect classifications, such as the benefit of a retained customer, whereas the latter estimates the causal effect of an action, such as a retention campaign, on the outcome of interest. This study integrates cost-sensitive and causal classification by elaborating a unifying evaluation framework. The framework encompasses a range of existing and novel performance measures for evaluating both causal and conventional classification models in a cost-sensitive as well as a cost-insensitive manner. We proof that conventional classification is a specific case of causal classification in terms of a range of performance measures when the number of actions is equal to one. The framework is shown to instantiate to application-specific cost-sensitive performance measures that have been recently proposed for evaluating customer retention and response uplift models, and allows to maximize profitability when adopting a causal classification model for optimizing decision-making. The proposed framework paves the way toward the development of cost-sensitive causal learning methods and opens a range of opportunities for improving data-driven business decision-making.

[1]  David J. Hand,et al.  Measuring classifier performance: a coherent alternative to the area under the ROC curve , 2009, Machine Learning.

[2]  Wouter Verbeke,et al.  A Robust profit measure for binary classification model evaluation , 2018, Expert Syst. Appl..

[3]  Bart Baesens,et al.  A Novel Profit Maximizing Metric for Measuring Classification Performance of Customer Churn Prediction Models , 2013, IEEE Transactions on Knowledge and Data Engineering.

[4]  Spyros I. Zoumpoulis,et al.  Targeting Prospective Customers: Robustness of Machine-Learning Methods to Typical Data Challenges , 2020, Manag. Sci..

[5]  Bart Baesens,et al.  Development and application of consumer credit scoring models using profit-based classification measures , 2014, Eur. J. Oper. Res..

[6]  Uri Shalit,et al.  Estimating individual treatment effect: generalization bounds and algorithms , 2016, ICML.

[7]  Bart Baesens,et al.  Profit Driven Decision Trees for Churn Prediction , 2017, Eur. J. Oper. Res..

[8]  Bart Baesens,et al.  Profit Driven Business Analytics: A Practitioner's Guide to Transforming Big Data into Added Value , 2017 .

[9]  Nicholas Radcliffe,et al.  Using control groups to target on predicted lift: Building and assessing uplift model , 2007 .

[10]  Bart Baesens,et al.  New insights into churn prediction in the telecommunication sector: A profit driven data mining approach , 2012, Eur. J. Oper. Res..

[11]  Kathleen Kane,et al.  Mining for the truly responsive customers and prospects using true-lift modeling: Comparison of new and existing methods , 2014 .

[12]  Marc Ratkovic,et al.  Estimating treatment effect heterogeneity in randomized program evaluation , 2013, 1305.5682.

[13]  Szymon Jaroszewicz,et al.  Response Transformation and Profit Decomposition for Revenue Uplift Modeling , 2019, Eur. J. Oper. Res..

[14]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[15]  Wouter Verbeke,et al.  A Literature Survey and Experimental Evaluation of the State-of-the-Art in Uplift Modeling: A Stepping Stone Toward the Development of Prescriptive Analytics , 2018, Big Data.

[16]  Stefan Wager,et al.  Estimation and Inference of Heterogeneous Treatment Effects using Random Forests , 2015, Journal of the American Statistical Association.

[17]  S. Murphy,et al.  PERFORMANCE GUARANTEES FOR INDIVIDUALIZED TREATMENT RULES. , 2011, Annals of statistics.

[18]  Wagner A. Kamakura,et al.  Defection Detection: Measuring and Understanding the Predictive Accuracy of Customer Churn Models , 2006 .

[19]  Eva Ascarza Retention Futility: Targeting High-Risk Customers Might be Ineffective , 2018 .

[20]  Georges Zaccour,et al.  Optimal Marketing Strategies for the Acquisition and Retention of Service Subscribers , 2014, Manag. Sci..

[21]  Song-Hee Kim,et al.  Maximizing Intervention Effectiveness , 2017, Manag. Sci..

[22]  P. Holland Statistics and Causal Inference , 1985 .

[23]  Tias Guns,et al.  Learning to Rank for Uplift Modeling , 2020, IEEE Transactions on Knowledge and Data Engineering.

[24]  Wouter Verbeke,et al.  Why you should stop predicting customer churn and start using uplift models , 2021, Inf. Sci..

[25]  Monique Snoeck,et al.  Profit maximizing logistic model for customer churn prediction using genetic algorithms , 2017, Swarm Evol. Comput..

[26]  Peter A. Flach,et al.  A Unified View of Performance Metrics: Translating Threshold Choice into Expected Classification Loss C` Esar Ferri , 2012 .

[27]  Ke Wang,et al.  Direct Marketing When There Are Voluntary Buyers , 2006, Sixth International Conference on Data Mining (ICDM'06).

[28]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.