A Hybrid Approach for Class Imbalance Problem in Customer Churn Prediction: A Novel Extension to Under-sampling

Customer retention is becoming a key success factor for many business applications due to increasing market competition. Especially telecom companies are facing this challenge with a rapidly increasing number of service providers. Hence there is need to focus on customer churn prediction in order to detect the customers that are likely to churn i.e. switch from one service provider to another. Several data mining techniques are applied for classifying customers into the churn and non-churn category. But churn prediction applications comprise an imbalanced distribution of the dataset. One of the commonly used techniques to handle imbalanced data is re-sampling of data as it is independent of the classifier being used. In this paper, we develop a hybrid re-sampling approach named SOS-BUS by combining well known oversampling technique SMOTE with our novel under-sampling technique. Our methodology aims to focus on the necessary data of majority class and avoid their removal in order to overcome the limitation of random under-sampling. Experimental results show that the proposed approach outperforms the other reference techniques in terms of Area under ROC Curve (AUC).

[1]  Dirk Van den Poel,et al.  Handling class imbalance in customer churn prediction , 2009, Expert Syst. Appl..

[2]  Amir Hussain,et al.  Comparing Oversampling Techniques to Handle the Class Imbalance Problem: A Customer Churn Prediction Case Study , 2016, IEEE Access.

[3]  Konstantinos I. Diamantaras,et al.  A comparison of machine learning techniques for customer churn prediction , 2015, Simul. Model. Pract. Theory.

[4]  Adem Karahoca,et al.  Detecting GSM churners by using Euclidean Indexing HDMR , 2015, Appl. Soft Comput..

[5]  Bart Baesens,et al.  Building comprehensible customer churn prediction models with advanced rule induction techniques , 2011, Expert Syst. Appl..

[6]  Kaizhu Huang,et al.  Customer churn prediction in the telecommunication sector using a rough set approach , 2017, Neurocomputing.

[7]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[8]  Zhi-Hua Zhou,et al.  Exploratory Undersampling for Class-Imbalance Learning , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[9]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[10]  Guangquan Zhang,et al.  A Customer Churn Prediction Model in Telecom Industry Using Boosting , 2014, IEEE Transactions on Industrial Informatics.

[11]  Guo Li,et al.  A Big Data Clustering Algorithm for Mitigating the Risk of Customer Churn , 2016, IEEE Transactions on Industrial Informatics.

[12]  Ying He,et al.  MSMOTE: Improving Classification Performance When Training Data is Imbalanced , 2009, 2009 Second International Workshop on Computer Science and Engineering.

[13]  Chi-Hyuck Jun,et al.  Improved churn prediction in telecommunication industry by analyzing a large network , 2014, Expert Syst. Appl..

[14]  Jian Ma,et al.  An improved boosting based on feature selection for corporate bankruptcy prediction , 2014, Expert Syst. Appl..

[15]  Wei Li,et al.  nsemble-based hybrid probabilistic sampling for imbalanced data earning in lung nodule CAD , 2014 .

[16]  Joydeep Ghosh,et al.  Ensembles of $({\alpha})$-Trees for Imbalanced Classification Problems , 2014, IEEE Transactions on Knowledge and Data Engineering.

[17]  Bart Baesens,et al.  A Novel Profit Maximizing Metric for Measuring Classification Performance of Customer Churn Prediction Models , 2013, IEEE Transactions on Knowledge and Data Engineering.

[18]  Bart Baesens,et al.  New insights into churn prediction in the telecommunication sector: A profit driven data mining approach , 2012, Eur. J. Oper. Res..

[19]  Bart Baesens,et al.  An empirical comparison of techniques for the class imbalance problem in churn prediction , 2017, Inf. Sci..

[20]  Xin Yao,et al.  A novel evolutionary data mining algorithm with applications to churn prediction , 2003, IEEE Trans. Evol. Comput..

[21]  D. Maheswari,et al.  Churn prediction on huge telecom data using hybrid firefly based classification , 2017 .

[22]  Sattar Hashemi,et al.  To Combat Multi-Class Imbalanced Problems by Means of Over-Sampling Techniques , 2016, IEEE Transactions on Knowledge and Data Engineering.

[23]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[24]  I. Tomek,et al.  Two Modifications of CNN , 1976 .

[25]  David C. Yen,et al.  Applying data mining to telecom churn management , 2006, Expert Syst. Appl..

[26]  Vincent Y. F. Tan,et al.  A Parsimonious Mixture of Gaussian Trees Model for Oversampling in Imbalanced and Multimodal Time-Series Classification , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[27]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[28]  Suresh N. Mali,et al.  Classifier Ensemble Design for Imbalanced Data Classification: A Hybrid Approach☆ , 2016 .

[29]  Doaa Hassan,et al.  The Impact of False Negative Cost on the Performance of Cost Sensitive Learning Based on Bayes Minimum Risk: A Case Study in Detecting Fraudulent Transactions , 2017 .