Who Will Leave the Company?: A Large-Scale Industry Study of Developer Turnover by Mining Monthly Work Report

Software developer turnover has become a big challenge for information technology (IT) companies. The departure of key software developers might cause big loss to an IT company since they also depart with important business knowledge and critical technical skills. Understanding developer turnover is very important for IT companies to retain talented developers and reduce the loss due to developers' departure. Previous studies mainly perform qualitative observations or simple statistical analysis of developers' activity data to understand developer turnover. In this paper, we investigate whether we can predict the turnover of software developers in non-open source companies by automatically analyzing monthly self-reports. The monthly work reports in our study are from two IT companies. Monthly reports in these two companies are used to report a developer's activities and working hours in a month. We would like to investigate whether a developer will leave the company after he/she enters company for one year based on his/her first six monthly reports. To perform our prediction, we extract many factors from monthly reports, which are grouped into 6 dimensions. We apply several classifiers including naive Bayes, SVM, decision tree, kNN and random forest. We conduct an experiment on about 6-years monthly reports from two companies, this data contains 3,638 developers over time. We find that random forest classifier achieves the best performance with an F1-measure of 0.86 for retained developers and an F1-measure of 0.65 for not-retained developers. We also investigate the relationship between our proposed factors and developers' departure, and the important factors that indicate a developer's departure. We find the content of task report in monthly reports, the standard deviation of working hours, and the standard deviation of working hours of project members in the first month are the top three important factors.

[1]  Tian Jiang,et al.  Personalized defect prediction , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[2]  Thomas Hess,et al.  An Empirical Study of Volunteer Members' Perceived Turnover in Open Source Software Projects , 2012, 2012 45th Hawaii International Conference on System Sciences.

[3]  R. Gunning The Technique of Clear Writing. , 1968 .

[4]  Rudolf Franz Flesch,et al.  How to write plain English : a book for lawyers and consumers , 1979 .

[5]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[6]  Sinno Jialin Pan,et al.  Transfer defect learning , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[7]  Gary Klein,et al.  Supervisor Support and Career Anchor Impact on the Career Satisfaction of the Entry-Level Information Systems Professional , 1999, J. Manag. Inf. Syst..

[8]  Xinli Yang,et al.  Condensing Class Diagrams With Minimal Manual Labeling Cost , 2016, 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC).

[9]  Zhenchang Xing,et al.  ActivitySpace: A Remembrance Framework to Support Interapplication Information Needs , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[10]  Mik Kersten,et al.  Mylar: a degree-of-interest model for IDEs , 2005, AOSD '05.

[11]  M. Coleman,et al.  A computer readability formula designed for machine scoring. , 1975 .

[12]  David Lo,et al.  Automatic, high accuracy prediction of reopened bugs , 2014, Automated Software Engineering.

[13]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[14]  David H. Wolpert,et al.  An Efficient Method To Estimate Bagging's Generalization Error , 1999, Machine Learning.

[15]  Tracy Hall,et al.  The impact of staff turnover on software projects: the importance of understanding what makes software practitioners tick , 2008, SIGMIS CPR '08.

[16]  Vincent Aleven,et al.  In search of learning: facilitating data analysis in educational games , 2013, CHI.

[17]  Alberto Sillitti,et al.  Cooperation wordle using pre-attentive processing techniques , 2013, 2013 6th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE).

[18]  Bart Goethals,et al.  Predicting the severity of a reported bug , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[19]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[20]  Nancy Pekala Holding On To Top Talent , 2001 .

[21]  James D. Hollan,et al.  Edit wear and read wear , 1992, CHI.

[22]  Guofang Nan,et al.  Optimal pricing for new product entry under free strategy , 2016, Information Technology and Management.

[23]  Peitsa Hynninen,et al.  Off-Site Commitment and Voluntary Turnover in GSD Projects , 2010, 2010 5th IEEE International Conference on Global Software Engineering.

[24]  David Lo,et al.  HYDRA: Massively Compositional Model for Cross-Project Defect Prediction , 2016, IEEE Transactions on Software Engineering.

[25]  Padraig Cunningham,et al.  Exploring the Relationship between Membership Turnover and Productivity in Online Communities , 2014, ICWSM.

[26]  David Lo,et al.  What are the characteristics of high-rated apps? A case study on free Android Applications , 2015, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[27]  Steven G. Westlund,et al.  Journal of Information Technology Management Retaining Talent: Assessing Job Satisfaction Facets Most Significantly Related to Software Developer Turnover Intentions , 2022 .

[28]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[29]  John W. Slocum,et al.  A causal analysis of the impact of job performance on the voluntary turnover process , 1987 .

[30]  Thomas G. Dietterich,et al.  TaskTracer: a desktop environment to support multi-tasking knowledge workers , 2005, IUI.

[31]  Gerald C. Kane,et al.  Membership Turnover and Collaboration Success in Online Communities: Explaining Rises , 2011, MIS Q..

[32]  R. P. Fishburne,et al.  Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel , 1975 .

[33]  Gerald C. Kane,et al.  Online communities: explaining rises and falls from grace in wikipedia , 2011 .

[34]  David Krackhardt,et al.  When friends leave: A structural analysis of the relationship between turnover and stayers' attitudes. , 1985 .

[35]  David Lo,et al.  Automated Bug Report Field Reassignment and Refinement Prediction , 2016, IEEE Transactions on Reliability.

[36]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[37]  Sven Laumer,et al.  Who Will Remain? An Evaluation of Actual Person-Job and Person-Team Fit to Predict Developer Retention in FLOSS Projects , 2012, 2012 45th Hawaii International Conference on System Sciences.

[38]  Yi Zhang,et al.  Classifying Software Changes: Clean or Buggy? , 2008, IEEE Transactions on Software Engineering.

[39]  James L. Price,et al.  Reflections on the determinants of voluntary turnover , 2001 .

[40]  E A Smith,et al.  Automated readability index. , 1967, AMRL-TR. Aerospace Medical Research Laboratories.

[41]  Audris Mockus,et al.  Succession: Measuring transfer of code and developer productivity , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[42]  Bruce Phillips,et al.  Tracking real-time user experience (TRUE): a comprehensive instrumentation solution for complex systems , 2008, CHI.

[43]  Pratyush Nidhi Sharma,et al.  Examining Turnover in Open Source Software Projects Using Logistic Hierarchical Linear Modeling Approach , 2012, OSS.

[44]  G. Harry McLaughlin,et al.  SMOG Grading - A New Readability Formula. , 1969 .

[45]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[46]  Tom Yeh,et al.  Associating the visual representation of user interfaces with their internal structures and metadata , 2011, UIST.

[47]  David Lo,et al.  Collective Personalized Change Classification With Multiobjective Search , 2016, IEEE Transactions on Reliability.

[48]  William H. Mobley,et al.  Employee turnover : causes, consequences, and control , 1983 .

[49]  Robert E. Kraut,et al.  Fresh faces in the crowd: turnover, identity, and commitment in online groups , 2012, CSCW.

[50]  Gail C. Murphy,et al.  Impact of developer turnover on quality in open-source software , 2015, ESEC/SIGSOFT FSE.

[51]  David Lo,et al.  Predicting Crashing Releases of Mobile Applications , 2016, ESEM.

[52]  Audris Mockus,et al.  Organizational volatility and its effects on software defects , 2010, FSE '10.

[53]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.