A hierarchical multiple kernel support vector machine for customer churn prediction using longitudinal behavioral data

The availability of abundant data posts a challenge to integrate static customer data and longitudinal behavioral data to improve performance in customer churn prediction. Usually, longitudinal behavioral data are transformed into static data before being included in a prediction model. In this study, a framework with ensemble techniques is presented for customer churn prediction directly using longitudinal behavioral data. A novel approach called the hierarchical multiple kernel support vector machine (H-MK-SVM) is formulated. A three phase training algorithm for the H-MK-SVM is developed, implemented and tested. The H-MK-SVM constructs a classification function by estimating the coefficients of both static and longitudinal behavioral variables in the training process without transformation of the longitudinal behavioral data. The training process of the H-MK-SVM is also a feature selection and time subsequence selection process because the sparse non-zero coefficients correspond to the variables selected. Computational experiments using three real-world databases were conducted. Computational results using multiple criteria measuring performance show that the H-MK-SVM directly using longitudinal behavioral data performs better than currently available classifiers.

[1]  Dirk Van den Poel,et al.  Benefits of quantile regression for the analysis of customer lifetime value in a contractual setting: An application in financial services , 2009, Expert Syst. Appl..

[2]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[3]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[4]  Longbing Cao,et al.  In-depth behavior understanding and use: The behavior informatics approach , 2010, Inf. Sci..

[5]  Dirk Van den Poel,et al.  Investigating purchasing-sequence patterns for financial services using Markov, MTD and MTDg models , 2006, Eur. J. Oper. Res..

[6]  Chieh-Yuan Tsai,et al.  A change detection method for sequential patterns , 2009, Decis. Support Syst..

[7]  Frank Klawonn,et al.  Sequence Mining for Customer Behaviour Predictions in Telecommunications , 2006 .

[8]  Dirk Van den Poel,et al.  Handling class imbalance in customer churn prediction , 2009, Expert Syst. Appl..

[9]  Kristof Coussement,et al.  Integrating the voice of customers through call center emails into a decision support system for churn prediction , 2008, Inf. Manag..

[10]  M. Tahar Kechadi,et al.  A new feature set with new window techniques for customer churn prediction in land-line telecommunications , 2010, Expert Syst. Appl..

[11]  Yoonkyung Lee,et al.  Structured multicategory support vector machines with analysis of variance decomposition , 2006 .

[12]  Jie Sun,et al.  SFFS-PC-NN optimized by genetic algorithm for dynamic prediction of financial distress with longitudinal data streams , 2011, Knowl. Based Syst..

[13]  Li Xiu,et al.  Application of data mining techniques in customer relationship management: A literature review and classification , 2009, Expert Syst. Appl..

[14]  Kristof Coussement,et al.  Faculteit Economie En Bedrijfskunde Hoveniersberg 24 B-9000 Gent Churn Prediction in Subscription Services: an Application of Support Vector Machines While Comparing Two Parameter-selection Techniques Churn Prediction in Subscription Services: an Application of Support Vector Machines While Comparin , 2022 .

[15]  Dirk Van den Poel,et al.  Customer base analysis: partial defection of behaviourally loyal clients in a non-contractual FMCG retail setting , 2005, Eur. J. Oper. Res..

[16]  Nello Cristianini,et al.  A statistical framework for genomic data fusion , 2004, Bioinform..

[17]  Ayhan Demiriz,et al.  Linear Programming Boosting via Column Generation , 2002, Machine Learning.

[18]  Stefan Lessmann,et al.  A reference model for customer-centric data mining with support vector machines , 2009, Eur. J. Oper. Res..

[19]  Steve R. Gunn,et al.  Structural Modelling with Sparse Kernels , 2002, Machine Learning.

[20]  Bart Baesens,et al.  Building comprehensible customer churn prediction models with advanced rule induction techniques , 2011, Expert Syst. Appl..

[21]  Dirk Van den Poel,et al.  Faculteit Economie En Bedrijfskunde Hoveniersberg 24 B-9000 Gent Incorporating Sequential Information into Traditional Classification Models by Using an Element/position-sensitive Sam , 2022 .

[22]  Bart Baesens,et al.  New insights into churn prediction in the telecommunication sector: A profit driven data mining approach , 2012, Eur. J. Oper. Res..

[23]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[24]  Dirk Van den Poel,et al.  Customer attrition analysis for financial services using proportional hazard models , 2004, Eur. J. Oper. Res..

[25]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[26]  Ling Li,et al.  ADTreesLogit model for customer churn prediction , 2009, Ann. Oper. Res..

[27]  S. Sathiya Keerthi,et al.  An Efficient Method for Gradient-Based Adaptation of Hyperparameters in SVM Models , 2006, NIPS.

[28]  Dirk Van den Poel,et al.  Modeling complex longitudinal consumer behavior with Dynamic Bayesian networks: an Acquisition Pattern Analysis application , 2011, Journal of Intelligent Information Systems.

[29]  Bart Baesens,et al.  Faculteit Economie En Bedrijfskunde Hoveniersberg 24 B-9000 Gent Bayesian Network Classifiers for Identifying the Slope of the Customer Lifecycle of Long-life Customers Bayesian Network Classifiers for Identifying the Slope of the Customer Lifecycle of Long-life Customers , 2022 .

[30]  Chih-Fong Tsai,et al.  Customer churn prediction by hybrid neural networks , 2009, Expert Syst. Appl..

[31]  Y. Ilker Topcu,et al.  Applying Bayesian Belief Network approach to customer churn analysis: A case study on the telecom industry of Turkey , 2011, Expert Syst. Appl..

[32]  Sven F. Crone,et al.  The impact of preprocessing on data mining: An evaluation of classifier sensitivity in direct marketing , 2006, Eur. J. Oper. Res..

[33]  Steffen Zorn,et al.  Attitudinal perspectives for predicting churn , 2010 .

[34]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[35]  Jun Guo,et al.  An extended support vector machine forecasting framework for customer churn in e-commerce , 2011, Expert Syst. Appl..

[36]  Bart Baesens,et al.  Modeling churn using customer lifetime value , 2009, Eur. J. Oper. Res..

[37]  Dominique M. Hanssens,et al.  Time-series models in marketing:: Past, present and future , 2000 .

[38]  Mu-Chen Chen,et al.  Mining changes in customer behavior in retail marketing , 2005, Expert Syst. Appl..

[39]  Xindong Wu,et al.  CLAP: Collaborative pattern mining for distributed information systems , 2011, Decis. Support Syst..

[40]  Philip S. Yu,et al.  Behavior Informatics: An Informatics Perspective for Behavior Studies , 2009, IEEE Intell. Informatics Bull..

[41]  Dirk Van den Poel,et al.  CRM at a pay-TV company: Using analytical models to reduce customer attrition by targeted marketing for subscription services , 2007, Expert Syst. Appl..

[42]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[43]  Stefan Lessmann,et al.  Supervised Classification for Decision Support in Customer Relationship Management , 2008 .

[44]  Carlo Vercellis,et al.  Combining discrete SVM and fixed cardinality warping distances for multivariate time series classification , 2010, Pattern Recognit..

[45]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[46]  Jacek Gondzio,et al.  Hybrid MPI/OpenMP Parallel Linear Support Vector Machine Training , 2009, J. Mach. Learn. Res..

[47]  Jianping Li,et al.  Multiple-kernel SVM based multiple-task oriented data mining system for gene expression data analysis , 2011, Expert Syst. Appl..

[48]  Hao Helen Zhang,et al.  Component selection and smoothing in multivariate nonparametric regression , 2006, math/0702659.

[49]  Jianping Li,et al.  A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue , 2007, Artif. Intell. Medicine.

[50]  Dirk Van den Poel,et al.  Predicting home-appliance acquisition sequences: Markov/Markov for Discrimination and survival analysis for modeling sequential information in NPTB models , 2007, Decis. Support Syst..