Determinants of pull-based development in the context of continuous integration

The pull-based development model, widely used in distributed software teams on open source communities, can efficiently gather the wisdom from crowds. Instead of sharing access to a central repository, contributors create a fork, update it locally, and request to have their changes merged back, i.e., submit a pull-request. On the one hand, this model lowers the barrier to entry for potential contributors since anyone can submit pull-requests to any repository, but on the other hand it also increases the burden on integrators, who are responsible for assessing the proposed patches and integrating the suitable changes into the central repository. The role of integrators in pull-based development is crucial. They must not only ensure that pull-requests should meet the project’s quality standards before being accepted, but also finish the evaluations in a timely manner. To keep up with the volume of incoming pull-requests, continuous integration (CI) is widely adopted to automatically build and test every pull-request at the time of submission. CI provides extra evidences relating to the quality of pull-requests, which would help integrators to make final decision (i.e., accept or reject). In this paper, we present a quantitative study that tries to discover which factors affect the process of pull-based development model, including acceptance and latency in the context of CI. Using regression modeling on data extracted from a sample of GitHub projects deploying the Travis-CI service, we find that the evaluation process is a complex issue, requiring many independent variables to explain adequately. In particular, CI is a dominant factor for the process, which not only has a great influence on the evaluation process per se, but also changes the effects of some traditional predictors.

[1]  Premkumar T. Devanbu,et al.  Open Borders? Immigration in Open Source Projects , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[2]  D. Owen,et al.  Handbook of statistical distributions , 1978 .

[3]  James D. Herbsleb,et al.  Leveraging Transparency , 2013, IEEE Software.

[4]  Premkumar T. Devanbu,et al.  Will They Like This? Evaluating Code Contributions with Language Models , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[5]  Bogdan Vasilescu,et al.  Developer initiation and social interactions in OSS: A case study of the Apache Software Foundation , 2015, Empirical Software Engineering.

[6]  Gary Klein,et al.  An exploration of the relationship between software development process maturity and project performance , 2004, Inf. Manag..

[7]  Ahmed E. Hassan,et al.  Studying the Impact of Social Structures on Software Quality , 2010, 2010 IEEE 18th International Conference on Program Comprehension.

[8]  Premkumar T. Devanbu,et al.  Quality and productivity outcomes relating to continuous integration in GitHub , 2015, ESEC/SIGSOFT FSE.

[9]  Alexander Hars,et al.  Working for free? Motivations of participating in open source projects , 2001, Proceedings of the 34th Annual Hawaii International Conference on System Sciences.

[10]  Gerald W. Both,et al.  Object-oriented analysis and design with applications , 1994 .

[11]  James D. Herbsleb,et al.  Let's talk about it: evaluating contributions through discussion in GitHub , 2014, SIGSOFT FSE.

[12]  Georgios Gousios,et al.  Work practices and challenges in pull-based development: the contributor's perspective , 2015, ICSE.

[13]  Kent L. Beck,et al.  Embracing Change with Extreme Programming , 1999, Computer.

[14]  Alexander Hars,et al.  Working for Free? Motivations for Participating in Open-Source Projects , 2002, Int. J. Electron. Commer..

[15]  Andrew Glover,et al.  Continuous Integration: Improving Software Quality and Reducing Risk (The Addison-Wesley Signature Series) , 2007 .

[16]  Mathias Meyer,et al.  Continuous Integration and Its Tools , 2014, IEEE Software.

[17]  Premkumar T. Devanbu,et al.  Wait for It: Determinants of Pull Request Evaluation Latency on GitHub , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[18]  Chris Sauer,et al.  Technical Reviews: A Behaviorally Motivated Program of Research , 2022 .

[19]  Premkumar T. Devanbu,et al.  Gender and Tenure Diversity in GitHub Teams , 2015, CHI.

[20]  C. Metz Basic principles of ROC analysis. , 1978, Seminars in nuclear medicine.

[21]  Thomas Zimmermann,et al.  What Makes a Good Bug Report? , 2008, IEEE Transactions on Software Engineering.

[22]  Georgios Gousios,et al.  The GHTorent dataset and tool suite , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[23]  Christian Bird,et al.  Convergent contemporary software peer review practices , 2013, ESEC/FSE 2013.

[24]  Leif Singer,et al.  Creating a shared understanding of testing culture on a social coding site , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[25]  Georgios Gousios,et al.  A dataset for pull-based development research , 2014, MSR 2014.

[26]  Alexander Serebrenik,et al.  Continuous Integration in a Social-Coding World: Empirical Evidence from GitHub , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[27]  Audris Mockus,et al.  Patterns of folder use and project popularity: a case study of github repositories , 2014, ESEM '14.

[28]  Victor R. Basili,et al.  The influence of organizational structure on software quality , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[29]  Leon J. Osterweil,et al.  Software processes are software too , 1987, ISPW.

[30]  Daniel M. Germán,et al.  Cohesive and Isolated Development with Branches , 2012, FASE.

[31]  P. Rousseeuw,et al.  Alternatives to the Median Absolute Deviation , 1993 .

[32]  Gang Yin,et al.  Reviewer Recommender of Pull-Requests in GitHub , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[33]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[34]  Gang Yin,et al.  Exploring the patterns of social behavior in GitHub , 2014, CrowdSoft 2014.

[35]  Jacob Cohen,et al.  Applied multiple regression/correlation analysis for the behavioral sciences , 1979 .

[36]  Alexander Serebrenik,et al.  Perceptions of Diversity on Git Hub: A User Survey , 2015, 2015 IEEE/ACM 8th International Workshop on Cooperative and Human Aspects of Software Engineering.

[37]  Arie van Deursen,et al.  An exploratory study of the pull-based software development model , 2014, ICSE.

[38]  Andreas Zeller,et al.  When do changes induce fixes? , 2005, ACM SIGSOFT Softw. Eng. Notes.

[39]  James D. Herbsleb,et al.  Influence of social and technical factors for evaluating contribution in GitHub , 2014, ICSE.

[40]  Daniela Cruzes,et al.  Empirical validation of human factors in predicting issue lead time in open source projects , 2011, Promise '11.

[41]  Jesper Holck,et al.  Continuous Integration and Quality Assurance: a case study of two open source projects , 2003, Australas. J. Inf. Syst..

[42]  Jane Greenberg,et al.  Who is an open source software developer? , 2002, CACM.

[43]  B. Kogut,et al.  Open-source Software Development and Distributed Innovation , 2001 .

[44]  Gang Yin,et al.  Who Should Review this Pull-Request: Reviewer Recommendation to Expedite Crowd Collaboration , 2014, 2014 21st Asia-Pacific Software Engineering Conference.

[45]  James D. Herbsleb,et al.  Social coding in GitHub: transparency and collaboration in an open software repository , 2012, CSCW.

[46]  Shinichi Nakagawa,et al.  A general and simple method for obtaining R2 from generalized linear mixed‐effects models , 2013 .

[47]  Sean Stolberg,et al.  Enabling Agile Testing through Continuous Integration , 2009, 2009 Agile Conference.

[48]  Premkumar T. Devanbu,et al.  On the naturalness of software , 2016, Commun. ACM.

[49]  Xavier Robin,et al.  pROC: an open-source package for R and S+ to analyze and compare ROC curves , 2011, BMC Bioinformatics.

[50]  Georgios Gousios,et al.  Work Practices and Challenges in Pull-Based Development: The Integrator's Perspective , 2014, ICSE.

[51]  Paul C. Johnson Extension of Nakagawa & Schielzeth's R2GLMM to random slopes models , 2014, Methods in ecology and evolution.