On the abandonment and survival of open source projects: An empirical investigation

Background: Evolution of open source projects frequently depends on a small number of core developers. The loss of such core developers might be detrimental for projects and even threaten their entire continuation. However, it is possible that new core developers assume the project maintenance and allow the project to survive. Aims: The objective of this paper is to provide empirical evidence on: 1) the frequency of project abandonment and survival, 2) the differences between abandoned and surviving projects, and 3) the motivation and difficulties faced when assuming an abandoned project. Method: We adopt a mixed-methods approach to investigate project abandonment and survival. We carefully select 1,932 popular GitHub projects and recover the abandoned and surviving projects, and conduct a survey with developers that have been instrumental in the survival of the projects. Results: We found that 315 projects (16%) were abandoned and 128 of these projects (41%) survived because of new core developers who assumed the project development. The survey indicates that (i) in most cases the new maintainers were aware of the project abandonment risks when they started to contribute; (ii) their own usage of the systems is the main motivation to contribute to such projects; (iii) human and social factors played a key role when making these contributions; and (iv) lack of time and the difficulty to obtain push access to the repositories are the main barriers faced by them. Conclusions: Project abandonment is a reality even in large open source projects and our work enables a better understanding of such risks, as well as highlights ways in avoiding them.

[1]  Emily Hill,et al.  Degree-of-knowledge , 2014, ACM Trans. Softw. Eng. Methodol..

[2]  James D. Herbsleb,et al.  Team Knowledge and Coordination in Geographically Distributed Software Development , 2007, J. Manag. Inf. Syst..

[3]  Daniela Cruzes,et al.  Recommended Steps for Thematic Synthesis in Software Engineering , 2011, 2011 International Symposium on Empirical Software Engineering and Measurement.

[4]  Christoph Treude,et al.  Who is Who in the Mailing List? Comparing Six Disambiguation Heuristics to Identify Multiple Addresses of a Participant , 2016, 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[5]  Marco Torchiano,et al.  On the Difficulty of Computing the Truck Factor , 2011, PROFES.

[6]  Christian Kästner,et al.  Why Do People Give Up FLOSSing? A Study of Contributor Disengagement in Open Source , 2019, OSS.

[7]  Andrew Begel,et al.  A Study of the Organizational Dynamics of Software Teams , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[8]  Meiyappan Nagappan,et al.  Curating GitHub for engineered software projects , 2016, PeerJ Prepr..

[9]  Marco Tulio Valente,et al.  Why modern open source projects fail , 2017, ESEC/SIGSOFT FSE.

[10]  Laurie A. Williams,et al.  Pair Programming Illuminated , 2002 .

[11]  Alexander Serebrenik,et al.  Who's who in Gnome: Using LSA to merge software repository identities , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[12]  Gregorio Robles,et al.  Developer Turnover in Global, Industrial Open Source Projects: Insights from Applying Survival Analysis , 2017, 2017 IEEE 12th International Conference on Global Software Engineering (ICGSE).

[13]  Marco Tulio Valente,et al.  Why We Engage in FLOSS: Answers from Core Developers , 2018, 2018 IEEE/ACM 11th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE).

[14]  Marco Aurélio Gerosa,et al.  Why do developers take breaks from contributing to OSS projects? A preliminary analysis , 2019, SoHeal@ICSE.

[15]  Orit Hazzan,et al.  Agile software testing in a large-scale project , 2006, IEEE Software.

[16]  Tom Mens,et al.  A comparison of identity merge algorithms for software repositories , 2013, Sci. Comput. Program..

[17]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[18]  Martin P. Robillard,et al.  Revisiting Turnover-Induced Knowledge Loss in Software Projects , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[19]  Pierre N. Robillard,et al.  Why Good Developers Write Bad Code: An Observational Case Study of the Impacts of Organizational Factors on Software Quality , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[20]  Jordi Cabot,et al.  Assessing the bus factor of Git repositories , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[21]  T. Mens,et al.  Socio-technical evolution of the Ruby ecosystem in GitHub , 2017, 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[22]  Pankaj Setia,et al.  How Peripheral Developers Contribute to Open-Source Software Development , 2012, Inf. Syst. Res..

[23]  Jeffrey C. Carver,et al.  Understanding the Impressions, Motivations, and Barriers of One Time Code Contributors to FLOSS Projects: A Survey , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[24]  Emerson R. Murphy-Hill,et al.  A degree-of-knowledge model to capture source code familiarity , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[25]  Christoph Treude,et al.  Overcoming Open Source Project Entry Barriers with a Portal for Newcomers , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[26]  Marco Aurélio Gerosa,et al.  Social Barriers Faced by Newcomers Placing Their First Contribution in Open Source Software Projects , 2015, CSCW.

[27]  Marco Tulio Valente,et al.  A Comparison of Three Algorithms for Computing Truck Factors , 2017, 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC).

[28]  Alexander Serebrenik,et al.  Perceptions of Diversity on Git Hub: A User Survey , 2015, 2015 IEEE/ACM 8th International Workshop on Cooperative and Human Aspects of Software Engineering.

[29]  Marco Torchiano,et al.  Is my project's truck factor low?: theoretical and empirical considerations about the truck factor threshold , 2011, WETSoM '11.

[30]  Eirini Kalliamvakou,et al.  An in-depth study of the promises and perils of mining GitHub , 2016, Empirical Software Engineering.

[31]  Alexander Serebrenik,et al.  Code of conduct in open source projects , 2017, 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[32]  Nicole Novielli,et al.  Anger and Its Direction in Collaborative Software Development , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering: New Ideas and Emerging Technologies Results Track (ICSE-NIER).

[33]  K. Goulden,et al.  Effect Sizes for Research: A Broad Practical Approach , 2006 .

[34]  Marco Tulio Valente,et al.  A novel approach for estimating Truck Factors , 2016, 2016 IEEE 24th International Conference on Program Comprehension (ICPC).

[35]  Daniela E. Damian,et al.  Selecting Empirical Methods for Software Engineering Research , 2008, Guide to Advanced Empirical Software Engineering.

[36]  Gabriele Bavota,et al.  Mining Version Histories for Detecting Code Smells , 2015, IEEE Transactions on Software Engineering.

[37]  M.M. Lehman,et al.  Programs, life cycles, and laws of software evolution , 1980, Proceedings of the IEEE.

[38]  Adam Croom,et al.  Roads and Bridges: The Unseen Labor Behind Our Digital Infrastructure / Ford Foundation , 2016 .

[39]  Marco Aurélio Gerosa,et al.  More Common Than You Think: An In-depth Study of Casual Contributors , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[40]  Gail C. Murphy,et al.  Impact of developer turnover on quality in open-source software , 2015, ESEC/SIGSOFT FSE.

[41]  Audris Mockus,et al.  Quantifying and Mitigating Turnover-Induced Knowledge Loss: Case Studies of Chrome and a Project at Avaya , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[42]  Forrest Shull,et al.  Are developers complying with the process: an XP study , 2010, ESEM '10.

[43]  Vern Paxson,et al.  The Matter of Heartbleed , 2014, Internet Measurement Conference.

[44]  Eleni Constantinou,et al.  An empirical comparison of developer retention in the RubyGems and npm software ecosystems , 2017, Innovations in Systems and Software Engineering.

[45]  Antonio Soares de Azevedo Terceiro,et al.  An Empirical Study on the Structural Complexity Introduced by Core and Peripheral Developers in Free Software Projects , 2010, 2010 Brazilian Symposium on Software Engineering.

[46]  Volker Gruhn,et al.  Algorithmic Complexity of the Truck Factor Calculation , 2014, PROFES.

[47]  Alexander Serebrenik,et al.  Going Farther Together: The Impact of Social Capital on Sustained Participation in Open Source , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[48]  Gregorio Robles,et al.  Using Software Archaeology to Measure Knowledge Loss in Software Projects Due to Developer Turnover , 2009 .

[49]  Premkumar T. Devanbu,et al.  Gender and Tenure Diversity in GitHub Teams , 2015, CHI.

[50]  Alexander Serebrenik,et al.  Poster: How Do Community Smells Influence Code Smells? , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion (ICSE-Companion).

[51]  Filippo Ricca,et al.  Are Heroes common in FLOSS projects? , 2010, ESEM '10.