Exploring nested ensemble learners using overproduction and choose approach for churn prediction in telecom industry

Combining multiple classifiers to create hybrid learners (ensembles) has gained popularity in recent years. Ensembles are gaining more interest in the field of data mining as they have reportedly performed best predictions as compared to individual classifiers. This has resulted in experimentation with new ways of ensemble creation. This paper presents a study on creation of novel hybrid ways of combining multiple ensemble models using ‘over production and choose approach.’ In contrast to the original concept of ensembles that combine various learners, the proposed ensemble models comprise of combinations of other ensembles. In particular, we have combined learners as in composition of other learners, thus producing nested learners. Two such models named as Boosted-Stacked learners and Bagged-Stacked learners are proposed and are shown to outperform the traditional ensembles. Experiments are performed in churn prediction domain where a benchmark customer churn dataset (available on UCI repository) and a newly created dataset from a South Asian wireless telecom operator (named as SATO) are used. SATO dataset is created as balanced dataset (having equal number of churners and non-churners). The novel Boosted-Stacked learner and Bagged-Stacked learner achieved accuracies of 98.4% and 97.2%, respectively, on the UCI Churn dataset outperforming the existing state-of-the-art techniques. Furthermore, a high accuracy on the SATO dataset validates the effectiveness of the proposed models on balanced as well as imbalanced datasets.

[1]  Safdar Ali,et al.  Can-Evo-Ens: Classifier stacking based evolutionary ensemble system for prediction of human breast cancer using amino acid sequences , 2015, J. Biomed. Informatics.

[2]  Praveen Asthana A comparison of machine learning techniques for customer churn prediction , 2018 .

[3]  Xin Niu,et al.  A mobile recommendation system based on logistic regression and Gradient Boosting Decision Trees , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[4]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[5]  Xianli Wang,et al.  Ensemble lightning prediction models for the province of Alberta, Canada , 2016 .

[6]  Fabio Roli,et al.  Design of effective neural network ensembles for image classification purposes , 2001, Image Vis. Comput..

[7]  George Athanasopoulos,et al.  Bagging in Tourism Demand Modeling and Forecasting , 2018 .

[8]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[9]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[10]  Behzad Moshiri,et al.  A Hybrid Approach to Predict Churn , 2010, 2010 IEEE Asia-Pacific Services Computing Conference.

[11]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[12]  Sven Gotovac,et al.  Modeling Data Mining Applications for Prediction of Prepaid Churn in Telecommunication Services , 2010 .

[13]  Konstantinos I. Diamantaras,et al.  A comparison of machine learning techniques for customer churn prediction , 2015, Simul. Model. Pract. Theory.

[14]  Zahid Halim,et al.  Multi-view document clustering via ensemble method , 2014, Journal of Intelligent Information Systems.

[15]  Stefan Lessmann,et al.  A comparative analysis of data preparation algorithms for customer churn prediction: A case study in the telecommunication industry , 2017, Decis. Support Syst..

[16]  Alvis Cheuk M. Fong,et al.  A churn prediction model for prepaid customers in telecom using fuzzy classifiers , 2017, Telecommun. Syst..

[17]  Hossam Faris,et al.  Negative Correlation Learning for Customer Churn Prediction: A Comparison Study , 2015, TheScientificWorldJournal.

[18]  Yu Zhao,et al.  Customer Churn Prediction Using Improved One-Class Support Vector Machine , 2005, ADMA.

[19]  Monique Snoeck,et al.  Profit maximizing logistic model for customer churn prediction using genetic algorithms , 2017, Swarm Evol. Comput..

[20]  L. Breiman Heuristics of instability and stabilization in model selection , 1996 .

[21]  Yossi Richter,et al.  Predicting Customer Churn in Mobile Networks through Analysis of Social Groups , 2010, SDM.

[22]  William B. Yates,et al.  Engineering Multiversion Neural-Net Systems , 1996, Neural Computation.

[23]  Sungzoon Cho,et al.  Multi-class classification via heterogeneous ensemble of one-class classifiers , 2015, Eng. Appl. Artif. Intell..

[24]  Bart Baesens,et al.  Building comprehensible customer churn prediction models with advanced rule induction techniques , 2011, Expert Syst. Appl..

[25]  Chih-Fong Tsai,et al.  Variable selection by association rules for customer churn prediction of multimedia on demand , 2010, Expert Syst. Appl..

[26]  Prabin Kumar Panigrahi,et al.  A Neural Network based Approach for Predicting Customer Churn in Cellular Network Services , 2011, ArXiv.

[27]  Guo-en Xia,et al.  Model of Customer Churn Prediction on Support Vector Machine , 2008 .

[28]  Shuqin Cai,et al.  A Hybrid Churn Prediction Model in Mobile Telecommunication Industry , 2014 .

[29]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[30]  Kaizhu Huang,et al.  Customer churn prediction in the telecommunication sector using a rough set approach , 2017, Neurocomputing.

[31]  Herna L. Viktor,et al.  Dynamic adaptation of online ensembles for drifting data streams , 2017, Journal of Intelligent Information Systems.

[32]  Stefan Lessmann,et al.  Maximize What Matters: Predicting Customer Churn With Decision-Centric Ensemble Selection , 2015, ECIS.

[33]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[34]  Ali Mustafa Qamar,et al.  Telecommunication subscribers' churn prediction model using machine learning , 2013, Eighth International Conference on Digital Information Management (ICDIM 2013).

[35]  Chih-Fong Tsai,et al.  Customer churn prediction by hybrid neural networks , 2009, Expert Syst. Appl..

[36]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[37]  Jian Yang,et al.  Ensemble Model for Stock Price Movement Trend Prediction on Different Investing Periods , 2016, 2016 12th International Conference on Computational Intelligence and Security (CIS).

[38]  Mokhairi Makhtar,et al.  A Multi-Layer Perceptron Approach for Customer Churn Prediction , 2015, MUE 2015.

[39]  Koen W. De Bock,et al.  An empirical evaluation of rotation-based ensemble classifiers for customer churn prediction , 2011, Expert Syst. Appl..

[40]  George Potamias,et al.  Gene Selection via Discretized Gene-Expression Profiles and Greedy Feature-Elimination , 2004, SETN.

[41]  Y. Ilker Topcu,et al.  Applying Bayesian Belief Network approach to customer churn analysis: A case study on the telecom industry of Turkey , 2011, Expert Syst. Appl..

[42]  Xin Yao,et al.  Using Class Imbalance Learning for Software Defect Prediction , 2013, IEEE Transactions on Reliability.

[43]  Bart Baesens,et al.  New insights into churn prediction in the telecommunication sector: A profit driven data mining approach , 2012, Eur. J. Oper. Res..

[44]  Christophe Croux,et al.  Bagging and Boosting Classification Trees to Predict Churn , 2006 .

[45]  Fabio Roli,et al.  Methods for Designing Multiple Classifier Systems , 2001, Multiple Classifier Systems.

[46]  Shouyang Wang,et al.  Feature-selection-based dynamic transfer ensemble model for customer churn prediction , 2013, Knowledge and Information Systems.

[47]  Surbhi Bhatia,et al.  Customer churn analysis in telecom industry , 2015, 2015 4th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO) (Trends and Future Directions).

[48]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[49]  David C. Yen,et al.  Applying data mining to telecom churn management , 2006, Expert Syst. Appl..

[50]  Guangquan Zhang,et al.  A Customer Churn Prediction Model in Telecom Industry Using Boosting , 2014, IEEE Transactions on Industrial Informatics.

[51]  Shervin Malmasi,et al.  Native Language Identification using Stacked Generalization , 2017, ArXiv.

[52]  Hua Zou,et al.  Predicting potential side effects of drugs by recommender methods and ensemble learning , 2016, Neurocomputing.

[53]  Joanna Jedrzejowicz,et al.  Imbalanced data classification using MapReduce and relief , 2018, J. Inf. Telecommun..

[54]  Yong Liu,et al.  Research Model of Churn Prediction Based on Customer Segmentation and Misclassification Cost in the Context of Big Data , 2015 .

[55]  Rajkumar Roy,et al.  Churn Prediction: Does Technology Matter? , 2008 .

[56]  Yi Zhang,et al.  A Neural Network-Based Ensemble Prediction Using PMRS and ECM , 2014, 2014 47th Hawaii International Conference on System Sciences.

[57]  CoussementKristof,et al.  A comparative analysis of data preparation algorithms for customer churn prediction , 2017 .

[58]  Xiangjun Dong,et al.  K- local maximum margin feature extraction algorithm for churn prediction in telecom , 2017, Cluster Computing.