Prediction of Employee Turnover in Organizations using Machine Learning Algorithms A case for Extreme Gradient Boosting

Employee turnover has been identified as a key issue for organizations because of its adverse impact on work place productivity and long term growth strategies. To solve this problem, organizations use machine learning techniques to predict employee turnover. Accurate predictions enable organizations to take action for retention or succession planning of employees. However, the data for this modeling problem comes from HR Information Systems (HRIS); these are typically under-funded compared to the Information Systems of other domains in the organization which are directly related to its priorities. This leads to the prevalence of noise in the data that renders predictive models prone to over-fitting and hence inaccurate. This is the key challenge that is the focus of this paper, and one that has not been addressed historically. The novel contribution of this paper is to explore the application of Extreme Gradient Boosting (XGBoost) technique which is more robust because of its regularization formulation. Data from the HRIS of a global retailer is used to compare XGBoost against six historically used supervised classifiers and demonstrate its significantly higher accuracy for predicting employee turnover. In this paper, the problem of employee turnover and the key machine learning algorithms that have been used to solve it are discussed. The novel contribution of this paper is to explore the application of extreme gradient boosting (XGBoost) as an improvement on these traditional algorithms, specifically in its ability to generalize on noise-ridden data which is prevalent in this domain. This is done by using data from the HRIS of a global retailer and treating the attrition problem as a classification task and modeling it using supervised techniques. The conclusion is reached by contrasting the superior accuracy of the XGBoost classifier against other techniques and explaining the reason for its superior performance. This paper is structured as follows. Section II gives a brief overview of the employee turnover problem, the importance of solving it, and the historical work done in terms of application of machine learning techniques to solve this problem. Section III explores the 7 different supervised techniques, including XGBoost, that this paper compares. Section IV outlines the experimental design in terms of the characteristics of the dataset, pre-processing, cross-validation, and the choice of metrics for accuracy comparison. Section V showcases the results of the study and its subsequent discussion. Section VI concludes the paper by recommending the XGBoost classifier for predicting turnover.

[1]  Angela M. Farabee,et al.  Turnover Intentions of the Faculty at a Teaching-Focused University , 2006, Psychological reports.

[2]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[3]  Gary King,et al.  Logistic Regression in Rare Events Data , 2001, Political Analysis.

[4]  Lisa M. Finkelstein,et al.  What do the young (old) people think of me? Content and accuracy of age-based metastereotypes , 2013 .

[5]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[6]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[7]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[8]  Carlos Guestrin,et al.  XGBoost : Reliable Large-scale Tree Boosting System , 2015 .

[9]  V. Srinivasan,et al.  Establishing a link between employee turnover and withdrawal behaviours: application of data mining techniques , 2008 .

[10]  Girish Keshav Palshikar,et al.  Employee churn prediction , 2011, Expert Syst. Appl..

[11]  A. B. Adeyemo,et al.  ANALYZING EMPLOYEE ATTRITION USING DECISION TREE ALGORITHMS , 2013 .

[12]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[13]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[14]  John L. Cotton,et al.  Employee Turnover: A Meta-Analysis and Review with Implications for Research , 1986 .

[15]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[16]  Nick Bontis,et al.  Voluntary turnover: knowledge management – friend or foe? , 2002 .

[17]  Wei-Chiang Hong,et al.  A Comparative Test of Two Employee Turnover Prediction Models , 2006 .

[18]  Sarah Jane Delany k-Nearest Neighbour Classifiers , 2007 .

[19]  Hany A. Elsalamony,et al.  Bank Direct Marketing Analysis of Data Mining Techniques , 2014 .

[20]  Shari L. Peterson Toward a Theoretical Model of Employee Turnover: A Human Resource Development Perspective , 2004 .

[21]  Terence R. Mitchell,et al.  5 Turnover and Retention Research: A Glance at the Past, a Closer Review of the Present, and a Venture into the Future , 2008 .

[22]  Terence R. Mitchell,et al.  When Employees Are Out of Step with Coworkers: How Job Satisfaction Trajectory and Dispersion Influence Individual- and Unit-Level Voluntary Turnover , 2012 .

[23]  Marjorie Laura Kane-Sellers Predictive models of employee voluntary turnover in a North American professional sales force using data-mining analysis , 2009 .

[24]  Stefan Lessmann,et al.  A reference model for customer-centric data mining with support vector machines , 2009, Eur. J. Oper. Res..

[25]  Brian W. Swider,et al.  Born to burnout: A meta-analytic path model of personality, job burnout, and work outcomes , 2010 .

[26]  Abdul Razak Hamdan,et al.  Towards applying Data Mining Techniques for Talent Mangement , 2022 .

[27]  Sabrina Jahan,et al.  Human Resources Information System (HRIS): A Theoretical Perspective , 2014 .

[28]  Elise K. Kalokerinos,et al.  Stereotype threat among older employees: relationship with job attitudes and turnover intentions. , 2013, Psychology and aging.

[29]  D. Allen,et al.  Test of a mediated performance-turnover relationship highlighting the moderating roles of visibility and reward contingency. , 2001, The Journal of applied psychology.

[30]  Neal Schmitt,et al.  A dynamic multilevel model of demographic diversity and misfit effects. , 2005, The Journal of applied psychology.