Profit-based feature selection using support vector machines - General framework and an application for customer retention

Graphical abstractDisplay Omitted HighlightsA novel profit-based feature selection method for churn prediction with SVM is presented.A backward elimination algorithm is performed to maximize the profit of a retention campaign.Our experiments on churn prediction datasets underline the potential of the proposed approaches. Churn prediction is an important application of classification models that identify those customers most likely to attrite based on their respective characteristics described by e.g. socio-demographic and behavioral variables. Since nowadays more and more of such features are captured and stored in the respective computational systems, an appropriate handling of the resulting information overload becomes a highly relevant issue when it comes to build customer retention systems based on churn prediction models. As a consequence, feature selection is an important step of the classifier construction process. Most feature selection techniques; however, are based on statistically inspired validation criteria, which not necessarily lead to models that optimize goals specified by the respective organization. In this paper we propose a profit-driven approach for classifier construction and simultaneous variable selection based on support vector machines. Experimental results show that our models outperform conventional techniques for feature selection achieving superior performance with respect to business-related goals.

[1]  Vadlamani Ravi,et al.  Churn prediction using comprehensible support vector machine: An analytical CRM application , 2014, Appl. Soft Comput..

[2]  Dirk Van den Poel,et al.  Handling class imbalance in customer churn prediction , 2009, Expert Syst. Appl..

[3]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[4]  Miguel Cazorla,et al.  Feature selection, mutual information, and the classification of high-dimensional patterns , 2008, Pattern Analysis and Applications.

[5]  Robert C. Blattberg,et al.  Database Marketing: Analyzing and Managing Customers , 2008 .

[6]  David R. Anderson,et al.  Model Selection and Multimodel Inference , 2003 .

[7]  Masoud Nikravesh,et al.  Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[8]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[9]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[10]  Bart Baesens,et al.  New insights into churn prediction in the telecommunication sector: A profit driven data mining approach , 2012, Eur. J. Oper. Res..

[11]  Christophe Croux,et al.  Bagging and Boosting Classification Trees to Predict Churn , 2006 .

[12]  Taghi M. Khoshgoftaar,et al.  Feature Selection with High-Dimensional Imbalanced Data , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[13]  U. M. Feyyad Data mining and knowledge discovery: making sense out of data , 1996 .

[14]  Donald E. Brown,et al.  Future trends in business analytics and optimization , 2011, Intell. Data Anal..

[15]  Richard Weber,et al.  Granting and managing loans for micro-entrepreneurs: New developments and practical experiences , 2013, Eur. J. Oper. Res..

[16]  Masoud Nikravesh,et al.  Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing) , 2006 .

[17]  Eric Johnson,et al.  Predicting subscriber dissatisfaction and improving retention in the wireless telecommunications industry , 2000, IEEE Trans. Neural Networks Learn. Syst..

[18]  Usama M. Fayyad,et al.  Data Mining and Knowledge Discovery: Making Sense Out of Data , 1996, IEEE Expert.

[19]  Huan Liu,et al.  Spectral feature selection for supervised and unsupervised learning , 2007, ICML '07.

[20]  Bart Baesens Analytics in a Big Data World , 2014 .

[21]  Juha Reunanen,et al.  Overfitting in Making Comparisons Between Variable Selection Methods , 2003, J. Mach. Learn. Res..

[22]  Hui Li,et al.  Statistics-based wrapper for feature selection: An implementation on financial distress identification with support vector machine , 2014, Appl. Soft Comput..

[23]  Jonathan N. Crook,et al.  Credit Scoring and Its Applications , 2002, SIAM monographs on mathematical modeling and computation.

[24]  Bin Li,et al.  Automated Cellular Modeling and Prediction on a Large Scale , 2000, Artificial Intelligence Review.

[25]  Wagner A. Kamakura,et al.  Defection Detection: Measuring and Understanding the Predictive Accuracy of Customer Churn Models , 2006 .

[26]  Bart Baesens,et al.  Building comprehensible customer churn prediction models with advanced rule induction techniques , 2011, Expert Syst. Appl..

[27]  David J. Hand,et al.  Measuring classifier performance: a coherent alternative to the area under the ROC curve , 2009, Machine Learning.

[28]  Huan Liu,et al.  Feature Selection for Clustering: A Review , 2018, Data Clustering: Algorithms and Applications.

[29]  Bart Baesens,et al.  Social network analysis for customer churn prediction , 2014, Appl. Soft Comput..

[30]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[31]  Richard Weber,et al.  A wrapper method for feature selection using Support Vector Machines , 2009, Inf. Sci..

[32]  Nitesh V. Chawla,et al.  Data Mining for Imbalanced Datasets: An Overview , 2005, The Data Mining and Knowledge Discovery Handbook.

[33]  Emilio Carrizosa,et al.  Detecting relevant variables and interactions in supervised classification , 2011, Eur. J. Oper. Res..

[34]  Lei Wang,et al.  On Similarity Preserving Feature Selection , 2013, IEEE Transactions on Knowledge and Data Engineering.

[35]  Edwin R. Hancock,et al.  A Hypergraph-Based Approach to Feature Selection , 2011, CAIP.

[36]  Bart Baesens,et al.  A Novel Profit Maximizing Metric for Measuring Classification Performance of Customer Churn Prediction Models , 2013, IEEE Transactions on Knowledge and Data Engineering.

[37]  Richard Weber,et al.  Feature selection for Support Vector Machines via Mixed Integer Linear Programming , 2014, Inf. Sci..

[38]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[39]  Bart Baesens,et al.  Comprehensible Credit Scoring Models Using Rule Extraction from Support Vector Machines , 2007, Eur. J. Oper. Res..

[40]  J. H. Fleming,et al.  Human Sigma: Managing the Employee-Customer Encounter , 2007 .

[41]  Chih-Ping Wei,et al.  Turning telecommunications call details to churn prediction: a data mining approach , 2002, Expert Syst. Appl..

[42]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .