论文信息 - Comparative Analysis of Machine Learning Techniques for Telecommunication Subscribers' Churn Prediction

Comparative Analysis of Machine Learning Techniques for Telecommunication Subscribers' Churn Prediction

During the last two decades, the mobile communication has become a dominant medium of communication. In numerous countries, especially the developed ones, the market is saturated to the extent that each new customer must be won over from the competitors. Advancements in technology and rapid improvements in telecom industry have provided customers with many choices. Customer retention is one of the major tasks for the telecom industry. On the other hand, public policies and standardization of mobile communication now allow customers to easily switch over from one carrier to another, resulting in a highly fluid market. Churn refers to customers who will leave or turn to other service providers. Acquiring new customers is much more expensive as compared to retaining existing customers. Therefore, it is far more cost-effective for service providers to predict customers who will churn in future and customize services or packages according to the customer's demands. As a result, churn prediction has emerged as one of the most crucial Business Intelligence (BI) applications that aim at identifying customers who are about to transfer to a competitor. In this paper, we present commonly used data mining techniques for the identification of customers who are about to churn. Based on historical data, these methods try to find patterns which can identify possible churners. Some of the well-known algorithms used during this research are Regression analysis, Decision Trees and Artificial Neural Networks (ANNs). The data set used in this study was obtained from Customer DNA website. It contains traffic data of 106,000 customers and their usage behavior for 3 months. The data set comprises of 48 variables. Spearman's correlation coefficient is used to select the variables of high impact.In order to solve the problem of class imbalance in the data set, re-sampling is used.The results show that the decision treesisthe most accurate classifier algorithm while identifying potential churners.

[1] Carolyn Penstein Rosé,et al. Author Age Prediction from Text using Linear Regression , 2011, LaTeCH@ACL.

[2] Hong Qiao,et al. Comparing data mining methods with logistic regression in childhood obesity prediction , 2009, Inf. Syst. Frontiers.

[3] K. Iyakutti,et al. Applications of Data Mining Techniques in Telecom Churn Prediction , 2012 .

[4] Rahul J. Jadhav,et al. Churn Prediction in Telecommunication Using Data Mining Technology , 2011 .

[5] W. Loh,et al. SPLIT SELECTION METHODS FOR CLASSIFICATION TREES , 1997 .

[6] Sven F. Crone,et al. Data Mining: Special Issue in Annals of Information Systems , 2009 .

[7] C. Meijer,et al. Churn Prediction in the Mobile Telecommunications Industry , 2006 .

[8] G. V. Kass. An Exploratory Technique for Investigating Large Quantities of Categorical Data , 1980 .

[9] Noah A. Smith,et al. Age Prediction from Text using Linear Regression , 2011 .

[10] Mona Nasr,et al. A Proposed Churn Prediction Model , 2012 .

[11] Mark Goadrich,et al. The relationship between Precision-Recall and ROC curves , 2006, ICML.

[12] K. Iyakutti,et al. Applications of Data Mining Techniques in TelecomChurn Prediction , 2012 .

[13] Andreas Holzinger,et al. Data Mining with Decision Trees: Theory and Applications , 2015, Online Inf. Rev..

[14] Robert C. Holte,et al. Severe Class Imbalance: Why Better Algorithms Aren't the Answer , 2005, ECML.

[15] Zhi-Hua Zhou,et al. Exploratory Under-Sampling for Class-Imbalance Learning , 2006, ICDM.

[16] R. A. Mollineda,et al. The class imbalance problem in pattern classiﬁcation and learning , 2009 .

[17] Teemu Mutanen,et al. Customer churn prediction - a case study in retail banking , 2010, Data Mining for Business Applications.