Pull Requests or Commits? Which Method Should We Use to Study Contributors' Behavior?

Social coding environments have been consistently growing since the popularization of the contribution model known as pull-based. This model has facilitated how developers make their contributions; developers can easily place a few pull requests without further commitment. Developers without strong ties to a project, the so-called casual contributors, often make a single contribution before disappearing. Interestingly, some studies about the topic use the number of commits made to identify the casual contributors, while others use the number of merged pull requests. Does the method used influence the results? In this paper, we replicate a study about casual contributors that relied on commits to identify and analyze these contributors. To achieve this goal, we analyzed the same set of GitHub-hosted software repositories used in the original paper. By using pull requests, we found an average of 66% casual contributors (in comparison to 48.98% when using commits), who were responsible for 12.5% of the contributions accepted (1.73% when using commits). We used a sample of 442 developers to investigate the accuracy of the method. We found that 11.3% of the contributors identified using the pull requests were misclassified (26.2% using commits). We also evidenced that using pull requests is more precise for determining the number of contributions, given that GitHub projects mostly follow the pull-based process. Our results indicate that the method used for mining contributors' data has the potential to influence the results. With this replication, it may be possible to improve previous results and reduce future efforts for new researchers when conducting studies that rely on the number of contributions.

[1]  Marco Aurélio Gerosa,et al.  More Common Than You Think: An In-depth Study of Casual Contributors , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[2]  John D. McGregor,et al.  Educating to achieve healthy open source ecosystems , 2018, ECSA.

[3]  Dirk Riehle,et al.  Uncovering the Periphery: A Qualitative Survey of Episodic Volunteering in Free/Libre and Open Source Software Communities , 2020, IEEE Transactions on Software Engineering.

[4]  Ann Barcomb,et al.  Episodic volunteering in open source communities , 2016, EASE.

[5]  Ronnie E. S. Santos,et al.  Replication of Empirical Studies in Software Engineering: An Update of a Systematic Mapping Study , 2015, 2015 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).

[6]  Christian Kästner,et al.  Why Do People Give Up FLOSSing? A Study of Contributor Disengagement in Open Source , 2019, OSS.

[7]  Marco Aurélio Gerosa,et al.  Almost There: A Study on Quasi-Contributors in Open-Source Software Projects , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[8]  Christoph Treude,et al.  Who is Who in the Mailing List? Comparing Six Disambiguation Heuristics to Identify Multiple Addresses of a Participant , 2016, 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[9]  Amanda Lee One-Time Contributors to FLOSS: Surveys and Data Analysis , 2018, SOEN.

[10]  Andreas Zeller,et al.  The impact of tangled code changes on defect prediction models , 2015, Empirical Software Engineering.

[11]  Leif Singer,et al.  Building test suites in social coding sites by leveraging drive-by commits , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[12]  N. Léchopier " Experimental and quasi-experimental designs for research on teaching ", de Donald T. Campbell & Julian C. Stanley, (1963). , 2011 .

[13]  Jeffrey C. Carver,et al.  The role of replications in Empirical Software Engineering , 2008, Empirical Software Engineering.

[14]  Forrest Shull,et al.  Building Knowledge through Families of Experiments , 1999, IEEE Trans. Software Eng..

[15]  Daniela E. Damian,et al.  The promises and perils of mining GitHub , 2009, MSR 2014.

[16]  Marco Aurélio Gerosa,et al.  Why do developers take breaks from contributing to OSS projects? A preliminary analysis , 2019, SoHeal@ICSE.

[17]  Daniel M. Germán,et al.  The promises and perils of mining git , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[18]  James Miller,et al.  Replicating software engineering experiments: a poisoned chalice or the Holy Grail , 2005, Inf. Softw. Technol..

[19]  R. Grissom,et al.  Effect Sizes for Research : Univariate and Multivariate Applications, Second Edition , 2005 .

[20]  Gang Yin,et al.  Does the Role Matter? An Investigation of the Code Quality of Casual Contributors in GitHub , 2016, 2016 23rd Asia-Pacific Software Engineering Conference (APSEC).

[21]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[22]  Georgios Gousios,et al.  The GHTorent dataset and tool suite , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[23]  James M. Bieman,et al.  The FreeBSD project: a replication case study of open source development , 2005, IEEE Transactions on Software Engineering.

[24]  Steve Counsell,et al.  The role and value of replication in empirical software engineering results , 2018, Inf. Softw. Technol..

[25]  Arie van Deursen,et al.  An exploratory study of the pull-based software development model , 2014, ICSE.

[26]  Premkumar T. Devanbu,et al.  Gender and Tenure Diversity in GitHub Teams , 2015, CHI.

[27]  Jeffrey C. Carver Towards Reporting Guidelines for Experimental Replications: A Proposal , 2010 .

[28]  Eirini Kalliamvakou,et al.  An in-depth study of the promises and perils of mining GitHub , 2016, Empirical Software Engineering.

[29]  Gustavo Pinto,et al.  How Does Contributors' Involvement Influence the Build Status of an Open-Source Software Project? , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[30]  Jeffrey C. Carver,et al.  Are One-Time Contributors Different? A Comparison to Core and Periphery Developers in FLOSS Repositories , 2017, 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).

[31]  Jeffrey C. Carver,et al.  Understanding the Impressions, Motivations, and Barriers of One Time Code Contributors to FLOSS Projects: A Survey , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).