Profit maximizing logistic regression modeling for customer churn prediction

The selection of classifiers which are profitable is becoming more and more important in real-life situations such as customer churn management campaigns in the telecommunication sector. In previous works, the expected maximum profit (EMP) metric has been proposed, which explicitly takes the cost of offer and the customer lifetime value (CLV) of retained customers into account. It thus permits the selection of the most profitable classifier, which better aligns with business requirements of end-users and stake holders. However, modelers are currently limited to applying this metric in the evaluation step. Hence, we expand on the previous body of work and introduce a classifier that incorporates the EMP metric in the construction of a classification model. Our technique, called ProfLogit, explicitly takes profit maximization concerns into account during the training step, rather than the evaluation step. The technique is based on a logistic regression model which is trained using a genetic algorithm (GA). By means of an empirical benchmark study applied to real-life data sets, we show that ProfLogit generates substantial profit improvements compared to the classic logistic model for many data sets. In addition, profit-maximized coefficient estimates differ considerably in magnitude from the maximum likelihood estimates.

[1]  Riccardo Poli,et al.  A Field Guide to Genetic Programming , 2008 .

[2]  H. Altay Güvenir,et al.  Ranking Instances by Maximizing the Area under ROC Curve , 2013, IEEE Transactions on Knowledge and Data Engineering.

[3]  Bart Baesens,et al.  New insights into churn prediction in the telecommunication sector: A profit driven data mining approach , 2012, Eur. J. Oper. Res..

[4]  Dorothea Heiss-Czedik,et al.  An Introduction to Genetic Algorithms. , 1997, Artificial Life.

[5]  R. Bharat Rao,et al.  Cost-Sensitive Machine Learning , 2011 .

[6]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[7]  A. E. Eiben,et al.  Introduction to Evolutionary Computing , 2003, Natural Computing Series.

[8]  Johan A. K. Suykens,et al.  Benchmarking state-of-the-art classification algorithms for credit scoring , 2003, J. Oper. Res. Soc..

[9]  Max Kuhn,et al.  Applied Predictive Modeling , 2013 .

[10]  A. Athanassopoulos Customer Satisfaction Cues To Support Market Segmentation and Explain Switching Behavior , 2000 .

[11]  Keith L. Downing,et al.  Introduction to Evolutionary Algorithms , 2006 .

[12]  Kate Smith-Miles,et al.  On learning algorithm selection for classification , 2006, Appl. Soft Comput..

[13]  Victor S. Sheng,et al.  Cost-Sensitive Learning , 2009, Encyclopedia of Data Warehousing and Mining.

[14]  Bart Baesens,et al.  A Novel Profit Maximizing Metric for Measuring Classification Performance of Customer Churn Prediction Models , 2013, IEEE Transactions on Knowledge and Data Engineering.

[15]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[16]  Gary M. Weiss,et al.  Cost-Sensitive Learning vs. Sampling: Which is Best for Handling Unbalanced Classes with Unequal Error Costs? , 2007, DMIN.

[17]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[18]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[19]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[20]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[21]  Bart Baesens,et al.  Analytics in a Big Data World: The Essential Guide to Data Science and its Applications , 2014 .

[23]  Luca Scrucca,et al.  GA: A Package for Genetic Algorithms in R , 2013 .

[24]  Achim Zeileis,et al.  evtree: Evolutionary Learning of Globally Optimal Classification and Regression Trees in R , 2014 .

[25]  Wagner A. Kamakura,et al.  Defection Detection: Measuring and Understanding the Predictive Accuracy of Customer Churn Models , 2006 .

[26]  Bart Baesens,et al.  Building comprehensible customer churn prediction models with advanced rule induction techniques , 2011, Expert Syst. Appl..

[27]  Lars Schmidt-Thieme,et al.  Cost-sensitive learning methods for imbalanced data , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[28]  David J. Hand,et al.  ROC Curves for Continuous Data , 2009 .

[29]  Rodney X. Sturdivant,et al.  Applied Logistic Regression: Hosmer/Applied Logistic Regression , 2005 .

[30]  Szymon Jaroszewicz,et al.  Efficient AUC Optimization for Classification , 2007, PKDD.

[31]  Christoforos Anagnostopoulos,et al.  A better Beta for the H measure of classification performance , 2012, Pattern Recognit. Lett..

[32]  David J. Hand,et al.  Measuring classifier performance: a coherent alternative to the area under the ROC curve , 2009, Machine Learning.

[33]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..