Why and how developers fork what from whom in GitHub

Forking is the creation of a new software repository by copying another repository. Though forking is controversial in traditional open source software (OSS) community, it is encouraged and is a built-in feature in GitHub. Developers freely fork repositories, use codes as their own and make changes. A deep understanding of repository forking can provide important insights for OSS community and GitHub. In this paper, we explore why and how developers fork what from whom in GitHub. We collect a dataset containing 236,344 developers and 1,841,324 forks. We make surveys, and analyze programming languages and owners of forked repositories. Our main observations are: (1) Developers fork repositories to submit pull requests, fix bugs, add new features and keep copies etc. Developers find repositories to fork from various sources: search engines, external sites (e.g., Twitter, Reddit), social relationships, etc. More than 42 % of developers that we have surveyed agree that an automated recommendation tool is useful to help them pick repositories to fork, while more than 44.4 % of developers do not value a recommendation tool. Developers care about repository owners when they fork repositories. (2) A repository written in a developer’s preferred programming language is more likely to be forked. (3) Developers mostly fork repositories from creators. In comparison with unattractive repository owners, attractive repository owners have higher percentage of organizations, more followers and earlier registration in GitHub. Our results show that forking is mainly used for making contributions of original repositories, and it is beneficial for OSS community. Moreover, our results show the value of recommendation and provide important insights for GitHub to recommend repositories.

[1]  Bruce Ferwerda,et al.  GitHub developers use rockstars to overcome overflow of news , 2013, CHI Extended Abstracts.

[2]  Martin P. Robillard,et al.  Recommendation Systems for Software Engineering , 2010, IEEE Software.

[3]  David Lo,et al.  Network Structure of Social Coding in GitHub , 2013, 2013 17th European Conference on Software Maintenance and Reengineering.

[4]  Kevin Crowston,et al.  Free/Libre open-source software development: What we know and what we do not know , 2012, CSUR.

[5]  Walid Maalej,et al.  Potentials and challenges of recommendation systems for software development , 2008, RSSE '08.

[6]  Daniela E. Damian,et al.  The promises and perils of mining GitHub , 2009, MSR 2014.

[7]  Juho Lindman,et al.  Code Forking, Governance, and Sustainability in Open Source Software , 2013 .

[8]  Tommi Mikkonen,et al.  Open Source Systems: Long-Term Sustainability , 2012, IFIP Advances in Information and Communication Technology.

[9]  Anol Bhattacherjee,et al.  Organizational adoption of open source software: barriers and remedies , 2010, CACM.

[10]  Jesús M. González-Barahona,et al.  A Comprehensive Study of Software Forks: Dates, Reasons and Outcomes , 2012, OSS.

[11]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[12]  David Lo,et al.  Popularity, Interoperability, and Impact of Programming Languages in 100,000 Open Source Projects , 2013, 2013 IEEE 37th Annual Computer Software and Applications Conference.

[13]  Daniel M. Germán,et al.  The promises and perils of mining git , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[14]  Moreno Muffatto,et al.  Open Source as a Complex Adaptive System , 2003 .

[15]  Jan Bosch,et al.  Social Networking Meets Software Development: Perspectives from GitHub, MSDN, Stack Exchange, and TopCoder , 2013, IEEE Software.

[16]  Aybüke Aurum,et al.  Social Forking in Open Source Software: An Empirical Study , 2012, CAiSE Forum.

[17]  James D. Herbsleb,et al.  Social coding in GitHub: transparency and collaboration in an open software repository , 2012, CSCW.

[18]  Arie van Deursen,et al.  An exploratory study of the pull-based software development model , 2014, ICSE.

[19]  Lei Li,et al.  Understanding project dissemination on a social coding site , 2013, 2013 20th Working Conference on Reverse Engineering (WCRE).

[20]  Leif Singer,et al.  Creating a shared understanding of testing culture on a social coding site , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[21]  Neil A. Ernst,et al.  Code forking in open-source software: a requirements perspective , 2010, ArXiv.

[22]  Jennifer Marlow,et al.  Activity traces and signals in software developer recruitment and hiring , 2013, CSCW.

[23]  James D. Herbsleb,et al.  Social media and success in open source projects , 2012, CSCW.

[24]  George Neville-Neil Coder's block , 2011, CACM.

[25]  Bing Xie,et al.  Recommending relevant projects via user behaviour: an exploratory study on github , 2014, CrowdSoft 2014.

[26]  James D. Herbsleb,et al.  Leveraging Transparency , 2013, IEEE Software.

[27]  Martin P. Robillard,et al.  Recommendation Systems in Software Engineering , 2014, Springer Berlin Heidelberg.

[28]  Mariette DiChristina,et al.  Promises and Perils. , 2015 .

[29]  George Neville-Neil Think before you fork , 2011, CACM.

[30]  Chris DiBona,et al.  Open Sources: Voices from the Open Source Revolution , 1999 .

[31]  David Lo,et al.  What does software engineering community microblog about? , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).