论文信息 - Predicting Potential Banking Customer Churn using Apache Spark ML and MLlib Packages: A Comparative Study

Predicting Potential Banking Customer Churn using Apache Spark ML and MLlib Packages: A Comparative Study

This study was conducted based on an assumption that Spark ML package has much better performance and accuracy than Spark MLlib package in dealing with big data. The used dataset in the comparison is for bank customers transactions. The Decision tree algorithm was used with both packages to generate a model for predicting the churn proba-bility for bank customers depending on their transactions data. Detailed comparison results were recorded and conducted that the ML package and its new DataFrame-based APIs have better-evaluating performance and predicting accuracy.

Manal A. Abdel-Fattah | Sherif Kholief | Hend Sayed

[1] Wei Zhang,et al. The Comparison of Decision Tree Based Insurance Churn Prediction between Spark ML and SPSS , 2016, 2016 9th International Conference on Service Science (ICSS).

[2] M. Tahar Kechadi,et al. Customer churn prediction in telecommunications , 2012, Expert Syst. Appl..

[3] Abbas Keramati,et al. Developing a prediction model for customer churn from electronic banking services using data mining , 2016, Financial Innovation.

[4] Guangchi Liu,et al. Big data machine learning using apache spark MLlib , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[5] Tan Yi Fei,et al. Prediction on Customer Churn in the Telecommunications Sector Using Discretization and Naïve Bayes Classifier , 2017 .

[6] Yong Shi,et al. Prediction of Customer Attrition of Commercial Banks based on SVM Model , 2014, ITQM.

[7] Petra Perner,et al. Advances in Data Mining. Applications and Theoretical Aspects , 2014, Lecture Notes in Computer Science.

[8] K. Chitra,et al. Customer Retention in Banking Sector using Predictive Data Mining Technique , 2011 .