A study of external community contribution to open-source projects on GitHub

Open-source software projects are primarily driven by community contribution. However, commit access to such projects' software repositories is often strictly controlled. These projects prefer to solicit external participation in the form of patches or pull requests. In this paper, we analyze a set of 89 top-starred GitHub projects and their forks in order to explore the nature and distribution of such community contribution. We first classify commits (and developers) into three categories: core, external and mutant, and study the relative sizes of each of these classes through a ring-based visualization. We observe that projects written in mainstream scripting languages such as JavaScript and Python tend to include more external participation than projects written in upcoming languages such as Scala. We also visualize the geographic spread of these communities via geocoding. Finally, we classify the types of pull requests submitted based on their labels and observe that bug fixes are more likely to be merged into the main projects as compared to feature enhancements.

[1]  Saurabh Sinha,et al.  Entering the circle of trust: developer initiation as committers in open-source projects , 2011, MSR '11.

[2]  Georgios Gousios,et al.  The GHTorent dataset and tool suite , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[3]  Arie van Deursen,et al.  An exploratory study of the pull-based software development model , 2014, ICSE.

[4]  Jonathan Sillito,et al.  Why are software projects moving from centralized to decentralized version control systems? , 2009, 2009 ICSE Workshop on Cooperative and Human Aspects on Software Engineering.