Revisiting the applicability of the pareto principle to core development teams in open source software projects

It is often observed that the majority of the development work of an Open Source Software (OSS) project is contributed by a core team, i.e., a small subset of the pool of active devel- opers. In fact, recent work has found that core development teams follow the Pareto principle — roughly 80% of the code contributions are produced by 20% of the active developers. However, those findings are based on samples of between one and nine studied systems. In this paper, we revisit prior studies about core developers using 2,496 projects hosted on GitHub. We find that even when we vary the heuristic for detecting core developers, and when we control for system size, team size, and project age: (1) the Pareto principle does not seem to apply for 40%-87% of GitHub projects; and (2) more than 88% of GitHub projects have fewer than 16 core developers. Moreover, we find that when we control for the quantity of contributions, bug fixing accounts for a similar proportion of the contributions of both core (18%-20%) and non-core developers (21%-22%). Our findings suggest that the Pareto principle is not compatible with the core teams of many GitHub projects. In fact, several of the studied GitHub projects are susceptible to the “bus factor,” where the impact of a core developer leaving would be quite harmful.

[1]  James M. Bieman,et al.  The FreeBSD project: a replication case study of open source development , 2005, IEEE Transactions on Software Engineering.

[2]  Stefan Koch,et al.  Effort, co‐operation and co‐ordination in an open source software project: GNOME , 2002, Inf. Syst. J..

[3]  Jacques Klein,et al.  Got issues? Who cares about it? A large scale investigation of issue trackers from GitHub , 2013, 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE).

[4]  Jesús M. González-Barahona,et al.  Evolution of the core team of developers in libre software projects , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[5]  Georgios Gousios,et al.  Work practices and challenges in pull-based development: the contributor's perspective , 2015, ICSE.

[6]  Georgios Gousios,et al.  The GHTorent dataset and tool suite , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[7]  Dewayne E. Perry,et al.  Toward understanding the rhetoric of small source code changes , 2005, IEEE Transactions on Software Engineering.

[8]  Daniel M. Germán,et al.  What do large commits tell us?: a taxonomical study of large commits , 2008, MSR '08.

[9]  Daniela E. Damian,et al.  The promises and perils of mining GitHub , 2009, MSR 2014.

[10]  T. Mens,et al.  Evidence for the Pareto principle in Open Source Software Activity , 2011 .

[11]  Kouichi Kishida,et al.  Evolution patterns of open-source software systems and communities , 2002, IWPSE '02.

[12]  Jaco Geldenhuys,et al.  Finding the Core Developers , 2010, 2010 36th EUROMICRO Conference on Software Engineering and Advanced Applications.

[13]  D HerbslebJames,et al.  Two case studies of open source software development , 2002 .

[14]  Gregorio Robles,et al.  Remote analysis and measurement of libre software systems by means of the CVSAnalY tool , 2004, ICSE 2004.

[15]  Daniel M. Germán,et al.  A study of the contributors of PostgreSQL , 2006, MSR '06.

[16]  Kevin Crowston,et al.  Core and Periphery in Free/Libre and Open Source Software Team Communications , 2006, Proceedings of the 39th Annual Hawaii International Conference on System Sciences (HICSS'06).

[17]  Kouichi Kishida,et al.  Toward an understanding of the motivation of open source software developers , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[18]  Andreas Zeller,et al.  The impact of tangled code changes , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[19]  James D. Herbsleb,et al.  Social coding in GitHub: transparency and collaboration in an open software repository , 2012, CSCW.

[20]  Daniel M. Germán,et al.  Will my patch make it? And how fast? Case study on the Linux kernel , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[21]  Michele Lanza,et al.  On the nature of commits , 2008, 2008 23rd IEEE/ACM International Conference on Automated Software Engineering - Workshops.

[22]  J. Herbsleb,et al.  Two case studies of open source software development: Apache and Mozilla , 2002, TSEM.

[23]  Georgios Gousios,et al.  Work Practices and Challenges in Pull-Based Development: The Integrator's Perspective , 2014, ICSE.

[24]  Eric S. Raymond,et al.  The cathedral and the bazaar - musings on Linux and Open Source by an accidental revolutionary , 2001 .

[25]  Jordi Cabot,et al.  Assessing the bus factor of Git repositories , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[26]  Marco Torchiano,et al.  On the Difficulty of Computing the Truck Factor , 2011, PROFES.

[27]  Marco Torchiano,et al.  Is my project's truck factor low?: theoretical and empirical considerations about the truck factor threshold , 2011, WETSoM '11.

[28]  Arie van Deursen,et al.  An exploratory study of the pull-based software development model , 2014, ICSE.