Hyperparameter Optimization of Artificial Neural Network in Customer Churn Prediction using Genetic Algorithm

Purpose of the article: The ability of the company to predict customer churn and retain customers is considered to be worthy competitive advantage since it improves cost allocation in customer retention programs, retaining future revenue and profits. In addition, it has several positive indirect impacts such as increasing customer’s loyalty. Therefore, the focus of the article is on building highly reliable and robust classification model, which deals with such a task. Methodology/methods: The analysis is carried out on labelled ecommerce retail dataset describing 10 000 most valuable customers with the highest CLV (Customer Lifetime Value). To obtain the best performing ANN (Artificial Neural Network) classification model, proposed hyperparameter search space is explored with genetic algorithm to find suitable parameter settings. ANN classification performance is measured with regard to prediction ability, which is understood as point estimate of AUC (Area Under Curve) mean on 4fold cross-validation set. Explored part of hyperparameter search space is analyzed with conditional inference tree structure addressing underlying fundamental context of given optimization which results in identification of critical factors leading to well performing ANN classification model. Scientific aim: To present and execute experimental design for performance evaluation and hyperparameter optimization of classification models, which are used for customer churn prediction. Findings: It is concluded and statistically proven that in experimental context described, regularization parameter as well as training function have significant influence on classifiers AUC performance contrasting other properties of ANN. More specifically, well performing ANN classification models have regularization parameter set to 0, adaptation function set to trainlm or trainscg and more than 100 training epochs. Global optimum is identified for solution with regularization parameter set to 0, trainlm adaptation function, 350 training epochs and 7-4-2 architecture. Conclusions: Results imply that placing hyperparameter optimization to ANN classification model leads to improved customer churn prediction ability. The article describes design and execution of machine learning pipeline, hyperparameter optimization and original meta-analysis of the results with conditional inference tree structure, which are considered beneficial for further research.

[1]  Melanie Mitchell,et al.  An introduction to genetic algorithms , 1996 .

[2]  P. Danaher,et al.  Implementing a customer relationship strategy: The asymmetric impact of poor versus excellent execution , 2000 .

[3]  K. Hornik,et al.  Unbiased Recursive Partitioning: A Conditional Inference Framework , 2006 .

[4]  Grgoire Montavon,et al.  Neural Networks: Tricks of the Trade , 2012, Lecture Notes in Computer Science.

[5]  Li Xiu,et al.  Application of data mining techniques in customer relationship management: A literature review and classification , 2009, Expert Syst. Appl..

[6]  Daniela M. Witten,et al.  An Introduction to Statistical Learning: with Applications in R , 2013 .

[7]  Rajkumar Roy,et al.  Churn Prediction: Does Technology Matter? , 2008 .

[8]  Tomoharu Iwata,et al.  Recommendation method for extending subscription periods , 2006, KDD '06.

[9]  Francisco Herrera,et al.  Tackling Real-Coded Genetic Algorithms: Operators and Tools for Behavioural Analysis , 1998, Artificial Intelligence Review.

[10]  K. Deb An Efficient Constraint Handling Method for Genetic Algorithms , 2000 .

[11]  Laurene V. Fausett,et al.  Fundamentals Of Neural Networks , 1994 .

[12]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[13]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[14]  K. Zou,et al.  Receiver-Operating Characteristic Analysis for Evaluating Diagnostic Tests and Predictive Models , 2007, Circulation.

[15]  Achim Zeileis,et al.  Partykit: a modular toolkit for recursive partytioning in R , 2015, J. Mach. Learn. Res..

[16]  Kusum Deep,et al.  A real coded genetic algorithm for solving integer and mixed integer optimization problems , 2009, Appl. Math. Comput..

[17]  Dirk Van den Poel,et al.  Customer attrition analysis for financial services using proportional hazard models , 2004, Eur. J. Oper. Res..

[18]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..