Predicting Churn of Expert Respondents in Social Networks Using Data Mining Techniques: A Case Study of Stack Overflow

In Q&A social networks, the few respondents that answer most of the questions are an asset to that network. Being able to predict the churn of these expert respondents will enable the owners of such network put things in place in order to keep them. In this paper, we predicted the churn of expert respondents in Stack Overflow. We identified experts based on the InDegree of the respondents and the value of the incentives earned by these experts from the questions they have answered in the past. Using four data mining techniques: logistic regression, neural networks, support vector machines and random forests, we predicted user churn and evaluated our results with four evaluation metrics: percentage correctly classified, area under receiver operating characteristic curve, precision and recall. Of the four data mining algorithms, random forests performed best with PCC of 76%, ROC area of 0.82, precision of 0.76 and recall of 0.77.

[1]  Yi Zhang,et al.  Graph-based ranking algorithms for e-mail expertise analysis , 2003, DMKD '03.

[2]  Ross Maciejewski,et al.  Business Intelligence from Social Media: A Study from the VAST Box Office Challenge , 2014, IEEE Computer Graphics and Applications.

[3]  Mark S. Ackerman,et al.  Expertise networks in online communities: structure and algorithms , 2007, WWW '07.

[4]  Ebru Akcapinar Sezer,et al.  Identification of User Patterns in Social Networks by Data Mining Techniques: Facebook Case , 2010 .

[5]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[6]  Mohammad Mehdi Sepehri,et al.  Applying Data Mining to Customer Churn Prediction in an Internet Service Provider , 2010 .

[7]  João Falcão e Cunha,et al.  Modeling partial customer churn: On the value of first product-category purchase sequences , 2012, Expert systems with applications.

[8]  Bart Baesens,et al.  Building comprehensible customer churn prediction models with advanced rule induction techniques , 2011, Expert Syst. Appl..

[9]  Dirk Van den Poel,et al.  Predicting customer retention and profitability by using random forests and regression forests techniques , 2005, Expert Syst. Appl..

[10]  謝楠楨 An integrated data mining and behavioral scoring model for analyzing bank customers , 2004 .

[11]  Amir Khanlari,et al.  CUSTOMER LIFETIME VALUE (CLV) MEASUREMENT BASED ON RFM MODEL , 2007 .

[12]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[13]  Paul P. Maglio,et al.  Expertise identification using email communications , 2003, CIKM '03.

[14]  Bart Baesens,et al.  Domain knowledge integration in data mining using decision tables: case studies in churn prediction , 2009, J. Oper. Res. Soc..

[15]  Idan Szpektor,et al.  Churn prediction in new users of Yahoo! answers , 2012, WWW.

[16]  Marcel Karnstedt,et al.  Churn in Social Networks: A Discussion Boards Case Study , 2010, 2010 IEEE Second International Conference on Social Computing.

[17]  Shintaro Okazaki,et al.  Combining social-based data mining techniques to extract collective trends from twitter , 2014 .

[18]  Hanghang Tong,et al.  User churn in focused question answering sites: characterizations and prediction , 2014, WWW.

[19]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[20]  Max Kuhn,et al.  Applied Predictive Modeling , 2013 .

[21]  Bruno Rossi,et al.  Towards an Improvement of Bug Severity Classification , 2014, 2014 40th EUROMICRO Conference on Software Engineering and Advanced Applications.

[22]  A. B. Adeyemo,et al.  ANALYZING EMPLOYEE ATTRITION USING DECISION TREE ALGORITHMS , 2013 .

[23]  Seyed Mohammad Seyedhosseini,et al.  Cluster analysis using data mining approach to develop CRM methodology to assess the customer loyalty , 2010, Expert Syst. Appl..

[25]  Shengrui Wang,et al.  Identifying authoritative actors in question-answering forums: the case of Yahoo! answers , 2008, KDD.

[26]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[27]  Dirk Van den Poel,et al.  Customer base analysis: partial defection of behaviourally loyal clients in a non-contractual FMCG retail setting , 2005, Eur. J. Oper. Res..

[28]  Dirk Thorleuchter,et al.  Predicting e-commerce company success by mining the text of its publicly-accessible website , 2012, Expert Syst. Appl..

[29]  Kristof Coussement,et al.  Improving customer attrition prediction by integrating emotions from client/company interaction emails and evaluating multiple classifiers , 2009, Expert Syst. Appl..

[30]  Dirk Van den Poel,et al.  CRM at a pay-TV company: Using analytical models to reduce customer attrition by targeted marketing for subscription services , 2007, Expert Syst. Appl..