Applying over 100 classifiers for churn prediction in telecom companies

In today’s date where machine learning is the key to solve so many problems in different fields, one really should know the extent of its importance in their field. One of the major applications of machine learning is Predictive Analytics. Churn prediction is one of the key steps for customer retention in this saturating market scenario [31]. This is one of the major objectives and any toolkit which can give insights on this can be really beneficial for any service providing companies. Furthermore, one of the major problems that business analysts face during this procedure is to decide which classifier to select. In the continuously evolving field of machine learning where developers are constantly coming up with new machine learning algorithms, it is often difficult for the analysts to have knowledge about the varied options. In our work, we try to analyze and compare the performance of over 100 classifiers in churn prediction of a telecom company. We have used renowned classifiers from different families. This work can serve as the first step for any data scientist who wants to develop a churn prediction system for their application. Also, we try to explore efficient algorithms that will give a better result. Churn prediction is a mildly imbalanced set of the problem which degrade the performance of classifiers. The highest accuracy is given by the Regularized Random Forest classifier. Since the problem is imbalanced, we also consider the area under the Receiver Operating Characteristic (ROC) curve and the classifier Bagging Random Forest produces the best result in this scenario.

[1]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[2]  Giovanni Seni,et al.  Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions , 2010, Ensemble Methods in Data Mining.

[3]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[4]  Andino Maseleno,et al.  Optimal feature-based multi-kernel SVM approach for thyroid disease classification , 2018, The Journal of Supercomputing.

[5]  Yong Wang,et al.  Using Model Trees for Classification , 1998, Machine Learning.

[6]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[7]  Yuan-Hai Shao,et al.  Improvements on Twin Support Vector Machines , 2011, IEEE Transactions on Neural Networks.

[8]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[9]  Ron Kohavi,et al.  The Power of Decision Tables , 1995, ECML.

[10]  Ian H. Witten,et al.  Stacking Bagged and Dagged Models , 1997, ICML.

[11]  Y. Takefuji,et al.  Functional-link net computing: theory, system architecture, and functionalities , 1992, Computer.

[12]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[13]  Narasimhan Sundararajan,et al.  Risk-sensitive loss functions for sparse multi-category classification problems , 2008, Inf. Sci..

[14]  Madan Gopal,et al.  Least squares twin support vector machines for pattern classification , 2009, Expert Syst. Appl..

[15]  Stefan Kramer,et al.  Ensembles of Balanced Nested Dichotomies for Multi-class Problems , 2005, PKDD.

[16]  V. Ravi,et al.  Analytical CRM in banking and finance using SVM: a modified active learning-based rule extraction approach , 2012 .

[17]  Ron Kohavi,et al.  Wrappers for performance enhancement and oblivious decision graphs , 1995 .

[18]  John G. Cleary,et al.  K*: An Instance-based Learner Using and Entropic Distance Measure , 1995, ICML.

[19]  María José del Jesús,et al.  On the 2-tuples based genetic tuning performance for fuzzy rule based classification systems in imbalanced data-sets , 2010, Inf. Sci..

[20]  Preeti K. Dalvi,et al.  Analysis of customer churn prediction in telecom industry using decision trees and logistic regression , 2016, 2016 Symposium on Colossal Data Analysis and Networking (CDAN).

[21]  Bernhard Pfahringer,et al.  Locally Weighted Naive Bayes , 2002, UAI.

[22]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[23]  Muhammad Tanveer,et al.  A reduced universum twin support vector machine for class imbalance learning , 2020, Pattern Recognit..

[24]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[25]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[26]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[27]  Yu Zhao,et al.  Customer Churn Prediction Using Improved One-Class Support Vector Machine , 2005, ADMA.

[28]  Guang-Bin Huang,et al.  Extreme learning machine: a new learning scheme of feedforward neural networks , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[29]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[30]  Muhammad Tanveer,et al.  Sparse pinball twin support vector machines , 2019, Appl. Soft Comput..

[31]  Eibe Frank,et al.  Speeding Up Logistic Model Tree Induction , 2005, PKDD.

[32]  Raymond J. Mooney,et al.  Constructing Diverse Classifier Ensembles using Artificial Training Examples , 2003, IJCAI.

[33]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[34]  Ponnuthurai Nagaratnam Suganthan,et al.  Comprehensive evaluation of twin SVM based classifiers on UCI datasets , 2019, Appl. Soft Comput..

[35]  Reshma Khemchandani,et al.  Twin Support Vector Machines for Pattern Classification , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Mokhairi Makhtar,et al.  A Multi-Layer Perceptron Approach for Customer Churn Prediction , 2015, International Conference on Multimedia and Ubiquitous Engineering.

[37]  Eric Johnson,et al.  Churn Reduction in the Wireless Industry , 1999, NIPS.

[38]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[39]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[40]  TanveerM.,et al.  Machine Learning Techniques for the Diagnosis of Alzheimer’s Disease , 2020 .

[41]  Muhammad Tanveer,et al.  Robust energy-based least squares twin support vector machines , 2015, Applied Intelligence.

[42]  Muhammad Tanveer,et al.  Improved universum twin support vector machine , 2018, 2018 IEEE Symposium Series on Computational Intelligence (SSCI).

[43]  Deepak Gupta,et al.  Functional iterative approaches for solving support vector classification problems based on generalized Huber loss , 2019, Neural Computing and Applications.

[44]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[45]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[46]  Eibe Frank Fully supervised training of Gaussian radial basis function networks in WEKA , 2014 .

[47]  Praveen Asthana A comparison of machine learning techniques for customer churn prediction , 2018 .

[48]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[49]  Sundaram Suresh,et al.  Fast learning Circular Complex-valued Extreme Learning Machine (CC-ELM) for real-valued classification problems , 2012, Inf. Sci..

[50]  Muhammad Tanveer,et al.  EEG signal classification using universum support vector machine , 2018, Expert Syst. Appl..

[51]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Jalal A. Nasiri,et al.  Least squares twin multi-class classification support vector machine , 2015, Pattern Recognit..

[53]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[54]  Konstantinos I. Diamantaras,et al.  A comparison of machine learning techniques for customer churn prediction , 2015, Simul. Model. Pract. Theory.

[55]  Asifullah Khan,et al.  Genetic Programming and Adaboosting based churn prediction for Telecom , 2012, 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[56]  Geoffrey I. Webb,et al.  MultiBoosting: A Technique for Combining Boosting and Wagging , 2000, Machine Learning.

[57]  Suresh Chandra,et al.  Large-Scale Twin Parametric Support Vector Machine Using Pinball Loss Function , 2021, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[58]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[59]  Dean Abbott,et al.  Applied Predictive Analytics: Principles and Techniques for the Professional Data Analyst , 2014 .

[60]  Marcos de Sales Guerra Tsuzuki,et al.  Churn Prediction in Online Games Using Players’ Login Records: A Frequency Analysis Approach , 2015, IEEE Transactions on Computational Intelligence and AI in Games.

[61]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[62]  Dong Xu,et al.  L1-norm loss based twin support vector machine for data recognition , 2016, Inf. Sci..

[63]  Eibe Frank,et al.  A Simple Approach to Ordinal Classification , 2001, ECML.

[64]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[65]  Eric W. T. Ngai,et al.  Customer churn prediction using improved balanced random forests , 2009, Expert Syst. Appl..