Do as I Do, Not as I Say: Do Contribution Guidelines Match the GitHub Contribution Process?

Developer contribution guidelines are used in social coding sites like GitHub to explain and shape the process a project expects contributors to follow. They set standards for all participants and "save time and hassle caused by improperly created pull requests or issues that have to be rejected and re-submitted" (GitHub). Yet, we lack a systematic understanding of the content of a typical contribution guideline, as well as the extent to which these guidelines are followed in practice. Additionally, understanding how guidelines may impact projects that use Continuous Integration as part of the contribution process is of particular interest. To address this knowledge gap, we conducted a mixed-methods study of 53 GitHub projects with explicit contribution guidelines and coded the guidelines to extract key themes. We then created a process model using GitHub activity data (e.g., commit, new issue, new pull request) to compare the actual activity with the prescribed contribution guidelines. We show that approximately 68% of these projects diverge significantly from the expected process.

[1]  Premkumar T. Devanbu,et al.  Quality and productivity outcomes relating to continuous integration in GitHub , 2015, ESEC/SIGSOFT FSE.

[2]  Meiyappan Nagappan,et al.  Curating GitHub for engineered software projects , 2016, PeerJ Prepr..

[3]  Georgios Gousios,et al.  Work Practices and Challenges in Pull-Based Development: The Integrator's Perspective , 2014, ICSE.

[4]  Daniela E. Damian,et al.  The promises and perils of mining GitHub , 2009, MSR 2014.

[5]  James D. Herbsleb,et al.  Influence of social and technical factors for evaluating contribution in GitHub , 2014, ICSE.

[6]  Marco Aurélio Gerosa,et al.  A systematic literature review on the barriers faced by newcomers to open source software projects , 2015, Inf. Softw. Technol..

[7]  Marco Tulio Valente,et al.  Measuring and analyzing code authorship in 1 + 118 open source projects , 2019, Sci. Comput. Program..

[8]  K. Perreault,et al.  Research Design: Qualitative, Quantitative, and Mixed Methods Approaches , 2011 .

[9]  Christoph Treude,et al.  Categorizing the Content of GitHub README Files , 2018, Empirical Software Engineering.

[10]  Gerardo Canfora,et al.  How Open Source Projects Use Static Code Analysis Tools in Continuous Integration Pipelines , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[11]  Georgios Gousios,et al.  The GHTorent dataset and tool suite , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[12]  Kenichi Yoshida,et al.  How GitHub Contributing.md Contributes to Contributors , 2017, 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC).

[13]  Georgios Gousios,et al.  Work practices and challenges in pull-based development: the contributor's perspective , 2015, ICSE.

[14]  Georgios Gousios,et al.  Oops, My Tests Broke the Build: An Explorative Analysis of Travis CI with GitHub , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).