Churn Prediction in SaaS using Machine Learning

Anton Rautio: Churn Prediction in SaaS using Machine Learning Master’s Thesis Tampere University Knowledge Management May 2019 Customer churn happens in the Software-as-a-Service business similarly as it is in subscription-based industries like the telecommunications industry. But companies lack the knowledge about the factors lead to customers churn and are unable to react to it in time. Thus, it is necessary for companies to research customer churn prediction in order to react to customer churn in time. The study examines customer churn prediction in a quantitative method by utilizing several different machine learning algorithms with Python, namely recurrent neural network, convolutional neural network, support vector machine, and random forest algorithms. Data was collected from the case company’s database and manipulated to fit the algorithms. The dataset includes customer business data such as spend, customer platform usage data, customer service history data and customer feedback data on service quality. Grid search was carried out to find the optimal hyperparameters for each machine learning algorithm. The models of the algorithms were then trained and evaluated with the fitted data using the optimal hyperparameters. After the models had been trained, the test data was run through the models to get the results of the analysis. The results conclude that the most precise machine learning algorithm in this case is the support vector machine. Deep learning algorithms, such as the recurrent neural network and convolutional neural network did not perform well. Random forest had mediocre performance, coming close to the support vector machine’s performance. The random forest algorithm also offered a view on the importance of each feature in the prediction and showed that platform usage metrics, service quality metrics and business metrics are the largest drivers of churn in this case.

[1]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[2]  Peter A. Flach The Geometry of ROC Space: Understanding Machine Learning Metrics through ROC Isometrics , 2003, ICML.

[3]  Cha Zhang,et al.  Ensemble Machine Learning: Methods and Applications , 2012 .

[4]  Stephen R. Marsland,et al.  Machine Learning - An Algorithmic Perspective , 2009, Chapman and Hall / CRC machine learning and pattern recognition series.

[5]  Xiu Li,et al.  Preventing customer churn by using random forests modeling , 2008, 2008 IEEE International Conference on Information Reuse and Integration.

[6]  Chih-Ping Wei,et al.  Turning telecommunications call details to churn prediction: a data mining approach , 2002, Expert Syst. Appl..

[7]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[8]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[9]  M. Kubát An Introduction to Machine Learning , 2017, Springer International Publishing.

[10]  Rajanish Dass,et al.  An Analysis on the factors causing telecom churn: First Findings , 2011, AMCIS.

[11]  Michael Y. Hu,et al.  Forecasting with artificial neural networks: The state of the art , 1997 .

[12]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[14]  C. Metz Basic principles of ROC analysis. , 1978, Seminars in nuclear medicine.

[15]  C. J. van Rijsbergen,et al.  The geometry of information retrieval , 2004 .

[16]  Teemu Mutanen,et al.  Customer churn analysis - a case study , 2006 .

[17]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[18]  Gavin Brown,et al.  Ensemble Learning , 2010, Encyclopedia of Machine Learning and Data Mining.

[19]  Konstantinos I. Diamantaras,et al.  A comparison of machine learning techniques for customer churn prediction , 2015, Simul. Model. Pract. Theory.

[20]  Sotiris B. Kotsiantis,et al.  Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[21]  Farhan Hassan Khan,et al.  Churn Prediction using Neural Network based Individual and Ensemble Models , 2019, 2019 16th International Bhurban Conference on Applied Sciences and Technology (IBCAST).

[22]  Rajkumar Roy,et al.  Churn Prediction: Does Technology Matter? , 2008 .

[23]  Dong-Hee Shin,et al.  Mobile number portability on customer switching behavior: in the case of the Korean mobile market , 2007 .

[24]  Margareta Friman,et al.  Emotional experiences in customer relationships – a telecommunication study , 2008 .

[25]  Minghe Sun,et al.  A hierarchical multiple kernel support vector machine for customer churn prediction using longitudinal behavioral data , 2012, Eur. J. Oper. Res..

[26]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[27]  Federico Castanedo,et al.  Using Deep Learning to Predict Customer Churn in a Mobile Telecommunication Network , 2014 .

[28]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[29]  Tianqi Chen,et al.  Empirical Evaluation of Rectified Activations in Convolutional Network , 2015, ArXiv.

[30]  Christine Duffield,et al.  Nursing churn and turnover in Australian hospitals: nurses perceptions and suggestions for supportive strategies , 2014, BMC Nursing.

[31]  Foster Provost,et al.  Machine Learning from Imbalanced Data Sets 101 , 2008 .

[32]  Allen M. Weiss,et al.  Vendor Consideration and Switching Behavior for Buyers in High-Technology Markets , 1995 .

[33]  Dirk Van den Poel,et al.  Handling class imbalance in customer churn prediction , 2009, Expert Syst. Appl..

[34]  Romain Hérault,et al.  Facial landmark detection using structured output deep neural networks , 2015 .

[35]  Vicent Giner-Bosch,et al.  A methodology based on profitability criteria for defining the partial defection of customers in non-contractual settings , 2014, Eur. J. Oper. Res..

[36]  Bart De Moor,et al.  Hyperparameter Search in Machine Learning , 2015, ArXiv.

[37]  Bret Waters,et al.  Software as a service: A look at the customer benefits , 2005 .

[38]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[39]  Mona Nasr,et al.  A Proposed Churn Prediction Model , 2012 .

[40]  Chung-Tzer Liu,et al.  he effects of relationship quality and switching barriers on customer loyalty , 2010 .

[41]  Andrew W. Senior,et al.  Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.

[42]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[43]  Robert C. Blattberg,et al.  Customer Lifetime Value: Empirical Generalizations and Some Conceptual Questions , 2009 .

[44]  Asifullah Khan,et al.  Churn prediction in telecom using Random Forest and PSO based data balancing in combination with various feature selection strategies , 2012, Comput. Electr. Eng..

[45]  M. Anding SaaS: A Love-Hate Relationship for Enterprise Software Vendors , 2010 .

[46]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[47]  Matt J. Aitkenhead,et al.  A co-evolving decision tree classification method , 2008, Expert Syst. Appl..

[48]  A. Roli Artificial Neural Networks , 2012, Lecture Notes in Computer Science.

[49]  Bart Baesens,et al.  An empirical comparison of techniques for the class imbalance problem in churn prediction , 2017, Inf. Sci..

[50]  Jamie Y. T. Chang,et al.  Does Perceived Value Mediate the Relationship between Service Traits and Client Satisfaction in the Software-as-a-Service (SaaS)? , 2015 .

[51]  Sander Bohte,et al.  Editorial: Artificial Neural Networks as Models of Neural Information Processing , 2017, Front. Comput. Neurosci..

[52]  Koen W. De Bock,et al.  An empirical evaluation of rotation-based ensemble classifiers for customer churn prediction , 2011, Expert Syst. Appl..

[53]  Abinash Mishra,et al.  A Novel Approach for Churn Prediction Using Deep Learning , 2017, 2017 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC).

[54]  Y. Xu,et al.  IMPROVED ARTIFICIAL NEURAL NETWORK BASED ON INTELLIGENT OPTIMIZATION ALGORITHM , 2018 .

[55]  Taghi M. Khoshgoftaar,et al.  RUSBoost: A Hybrid Approach to Alleviating Class Imbalance , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[56]  Guo Li,et al.  A Big Data Clustering Algorithm for Mitigating the Risk of Customer Churn , 2016, IEEE Transactions on Industrial Informatics.

[57]  Chris P. Tsokos,et al.  Mathematical Statistics with Applications , 2009 .

[58]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[59]  Jae-Hyeon Ahn,et al.  Customer churn analysis: Churn determinants and mediation effects of partial defection in the Korean mobile telecommunications service industry , 2006 .

[60]  Eric W. T. Ngai,et al.  Customer churn prediction using improved balanced random forests , 2009, Expert Syst. Appl..

[61]  Ingrid Moerman,et al.  Pattern mining in tourist attraction visits through association rule learning on Bluetooth tracking data: A case study of Ghent, Belgium , 2014 .

[62]  Susan M. Keaveney,et al.  Customer Switching Behavior in Service Industries: An Exploratory Study , 1995 .

[63]  Valerio Veglio,et al.  Customers churn prediction and marketing retention strategies. An application of support vector machines based on the AUC parameter-selection technique in B2B e-commerce industry , 2017 .

[64]  Guo-en Xia,et al.  Model of Customer Churn Prediction on Support Vector Machine , 2008 .

[65]  Rebecca Grant,et al.  Forecasting and the Role of Churn in Software-as-a-Service Business Models , 2013 .

[66]  Pearl Brereton,et al.  Turning Software into a Service , 2003, Computer.

[67]  MsChinnuPJ ohny,et al.  Customer Churn Prediction:A Survey , 2017 .

[68]  Fernando Nogueira,et al.  Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning , 2016, J. Mach. Learn. Res..

[69]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[70]  Francesco Visin,et al.  A guide to convolution arithmetic for deep learning , 2016, ArXiv.

[71]  G. Szűcs Churn Analysis of a Product of Application Search in Mobile Platform , 2013 .

[72]  Dirk Van den Poel,et al.  Customer base analysis: partial defection of behaviourally loyal clients in a non-contractual FMCG retail setting , 2005, Eur. J. Oper. Res..

[73]  Donald E. Brown,et al.  Customer churn analysis for a software-as-a-service company , 2017, 2017 Systems and Information Engineering Design Symposium (SIEDS).

[74]  Hong Zhao,et al.  Data Security and Privacy Protection Issues in Cloud Computing , 2012, 2012 International Conference on Computer Science and Electronics Engineering.

[75]  Massimiliano Pontil,et al.  Support Vector Machines: Theory and Applications , 2001, Machine Learning and Its Applications.

[76]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[77]  Thomas Hess,et al.  Service Quality in Software-as-a-Service: Developing the SaaS-Qual Measure and Examining Its Role in Usage Continuance , 2011, J. Manag. Inf. Syst..

[78]  H. McDonald The Factors Influencing Churn Rates Among Season Ticket Holders: An Empirical Analysis , 2010 .

[79]  N. Kamalraj,et al.  A Survey on Churn Prediction Techniques in Communication Sector , 2013 .

[80]  Peter Harrington,et al.  Machine Learning in Action , 2012 .

[81]  C. Lee Giles,et al.  Active learning for class imbalance problem , 2007, SIGIR.