Pull Request Decision Explained: An Empirical Overview

Context : Pull-based development model is widely used in open source, leading the trends in distributed software development. One aspect which has garnered significant attention is studies on pull request decision identifying factors for explanation. Objective: This study builds on a decade long research on pull request decision to explain it. We empirically investigate how factors influence pull request decision and scenarios that change the influence of factors. Method : We identify factors influencing pull request decision on GitHub through a systematic literature review and infer it by mining archival data. We collect a total of 3,347,937 pull requests with 95 features from 11,230 diverse projects on GitHub. Using this data, we explore the relations of the factors to each other and build mixed-effect logistic regression models to empirically explain pull request decision. Results: Our study shows that a small number of factors explain pull request decision with the integrator same or different from the submitter as the most important factor. We also noted that some factors are important only in special cases e.g., the percentage of failed builds is important for pull request decision when continuous integration is used.

[1]  Ting Wang,et al.  Duplicate Pull Request Detection: When Time Matters , 2019, Internetware.

[2]  Shih-Wei Chou,et al.  The factors that affect the performance of open source software development – the perspective of social capital and expertise integration , 2011, Inf. Syst. J..

[3]  Rahul N. Iyer Effects of Personality Traits and Emotional Factors in Pull Request Acceptance. , 2019 .

[4]  Edward F. Gehringer,et al.  Use Bots to Improve GitHub Pull-Request Feedback , 2019, SIGCSE.

[5]  Darko Marinov,et al.  Usage, costs, and benefits of continuous integration in open-source projects , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[6]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[7]  Meng Xia,et al.  Exploring how software developers work with mention bot in GitHub , 2018, CCF Transactions on Pervasive Computing and Interaction.

[8]  Alberto Bacchelli,et al.  Code Review for Newcomers: Is It Different? , 2018, 2018 IEEE/ACM 11th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE).

[9]  Chetan Bansal,et al.  Predicting pull request completion time: a case study on large scale cloud services , 2019, ESEC/SIGSOFT FSE.

[10]  Ayushi Rastogi,et al.  On the Shoulders of Giants: A New Dataset for Pull-based Development Research , 2020, 2020 IEEE/ACM 17th International Conference on Mining Software Repositories (MSR).

[11]  Michael W. Godfrey,et al.  The Secret Life of Patches: A Firefox Case Study , 2012, 2012 19th Working Conference on Reverse Engineering.

[12]  Tom Mens,et al.  On the Impact of Pull Request Decisions on Future Contributions , 2018, BENEVOL.

[13]  Leonardo Gresta Paulino Murta,et al.  Acceptance factors of pull requests in open-source projects , 2015, SAC.

[14]  Parag C. Pendharkar,et al.  An empirical study of the impact of team size on software development effort , 2007, Inf. Technol. Manag..

[15]  Yogesh K. Dwivedi,et al.  Theory building with big data-driven research - Moving away from the "What" towards the "Why" , 2020, Int. J. Inf. Manag..

[16]  Leonardo Gresta Paulino Murta,et al.  Rejection Factors of Pull Requests Filed by Core Team Developers in Software Projects with High Acceptance Rates , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[17]  Audris Mockus,et al.  Which Pull Requests Get Accepted and Why? A study of popular NPM Packages , 2020, ArXiv.

[18]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[19]  Pearl Brereton,et al.  Systematic literature reviews in software engineering - A systematic literature review , 2009, Inf. Softw. Technol..

[20]  Rohan Padhye,et al.  A study of external community contribution to open-source projects on GitHub , 2014, MSR 2014.

[21]  Georgios Gousios,et al.  Relationship between geographical location and evaluation of developer contributions in github , 2018, ESEM.

[22]  Jeffrey C. Carver,et al.  Are One-Time Contributors Different? A Comparison to Core and Periphery Developers in FLOSS Repositories , 2017, 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).

[23]  Georgios Gousios,et al.  Work Practices and Challenges in Pull-Based Development: The Integrator's Perspective , 2014, ICSE.

[24]  Ritu Agarwal,et al.  Matching Platforms and HIV Incidence: An Empirical Investigation of Race, Gender, and Socio-Economic Status , 2015, Manag. Sci..

[25]  Pearl Brereton,et al.  Performing systematic literature reviews in software engineering , 2006, ICSE.

[26]  Jesse Hoey,et al.  Effects of Personality Traits on Pull Request Acceptance , 2021, IEEE Transactions on Software Engineering.

[27]  DongGyun Han,et al.  Writing Acceptable Patches: An Empirical Study of Open Source Project Patches , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[28]  Emerson Murphy-Hill,et al.  Gender differences and bias in open source: pull request acceptance of women versus men , 2017, PeerJ Comput. Sci..

[29]  Gang Yin,et al.  Who Should Review this Pull-Request: Reviewer Recommendation to Expedite Crowd Collaboration , 2014, 2014 21st Asia-Pacific Software Engineering Conference.

[30]  Christoph Treude,et al.  Automatic Generation of Pull Request Descriptions , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[31]  Michael W. Godfrey,et al.  Investigating technical and non-technical factors influencing modern code review , 2015, Empirical Software Engineering.

[32]  Li Zhang,et al.  What are the Characteristics of Reopened Pull Requests? A Case Study on Open Source Projects in GitHub , 2019, IEEE Access.

[33]  James D. Herbsleb,et al.  Influence of social and technical factors for evaluating contribution in GitHub , 2014, ICSE.

[34]  Gang Yin,et al.  Determinants of pull-based development in the context of continuous integration , 2016, Science China Information Sciences.

[35]  Nicole Novielli,et al.  A Preliminary Analysis on the Effects of Propensity to Trust in Distributed Software Development , 2017, 2017 IEEE 12th International Conference on Global Software Engineering (ICGSE).

[36]  Gang Yin,et al.  A Dataset of Duplicate Pull-Requests in GitHub , 2018, 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR).

[37]  Tomasz Burzykowski,et al.  Linear Mixed Effects Model , 2021, Encyclopedia of Gerontology and Population Aging.

[38]  Jacob Cohen,et al.  Applied multiple regression/correlation analysis for the behavioral sciences , 1979 .

[39]  Jia-Huan He,et al.  Who should comment on this pull request? Analyzing attributes for more accurate commenter recommendation in pull-based development , 2017, Inf. Softw. Technol..

[40]  Igor Steinmacher,et al.  Who Gets a Patch Accepted First? Comparing the Contributions of Employees and Volunteers , 2018, 2018 IEEE/ACM 11th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE).

[41]  Arie van Deursen,et al.  An exploratory study of the pull-based software development model , 2014, ICSE.

[42]  Pornsiri Muenchaisri,et al.  Finding Impact Factors for Rejection of Pull Requests on GitHub , 2018, ICNCC.

[43]  Michael W. Godfrey,et al.  The influence of non-technical factors on code review , 2013, 2013 20th Working Conference on Reverse Engineering (WCRE).

[44]  Michael Gusenbauer,et al.  Google Scholar to overshadow them all? Comparing the sizes of 12 academic search engines and bibliographic databases , 2018, Scientometrics.

[45]  Stephan Diehl,et al.  Small patches get in! , 2008, MSR '08.

[46]  Michele Marchesi,et al.  Empirical Analysis of Affect of Merged Issues on GitHub , 2019, 2019 IEEE/ACM 4th International Workshop on Emotion Awareness in Software Engineering (SEmotion).

[47]  Yuming Zhou,et al.  The impact of continuous integration on other software development practices: A large-scale empirical study , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[48]  Georgios Gousios,et al.  Automatically Prioritizing Pull Requests , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[49]  Premkumar T. Devanbu,et al.  Quality and productivity outcomes relating to continuous integration in GitHub , 2015, ESEC/SIGSOFT FSE.

[50]  Alexander Serebrenik,et al.  Beyond the Code Itself: How Programmers Really Look at Pull Requests , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Society (ICSE-SEIS).

[51]  Nikhil Khadke,et al.  Predicting Acceptance of GitHub Pull Requests , .

[52]  Gabriele Bavota,et al.  A Study on the Interplay between Pull Request Review and Continuous Integration Builds , 2019, 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[53]  Chanchal Kumar Roy,et al.  An insight into the pull requests of GitHub , 2014, MSR 2014.

[54]  Alexander Serebrenik,et al.  Gender, Representation and Online Participation: A Quantitative Study of StackOverflow , 2012, 2012 International Conference on Social Informatics.

[55]  Georgios Gousios,et al.  A dataset for pull-based development research , 2014, MSR 2014.

[56]  Jeffrey C. Carver,et al.  Impact of developer reputation on code review outcomes in OSS projects: an empirical investigation , 2014, ESEM '14.

[57]  Tom Mens,et al.  On the Effect of Discussions on Pull Request Decisions , 2019, BENEVOL.

[58]  Michael W. Godfrey,et al.  Studying Pull Request Merges: A Case Study of Shopify's Active Merchant , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[59]  James M. LeBreton,et al.  Relative Importance Analysis: A Useful Supplement to Regression Analysis , 2011 .

[60]  Premkumar T. Devanbu,et al.  A large scale study of programming languages and code quality in github , 2014, SIGSOFT FSE.

[61]  Anne-Wil Harzing,et al.  The Publish or Perish Book: Your guide to effective and responsible citation analysis , 2010 .

[62]  S. Apel,et al.  On the Influence of Developer Coreness on Patch Acceptance: A Survival Analysis , 2020 .

[63]  Daniel M. Germán,et al.  Will my patch make it? And how fast? Case study on the Linux kernel , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).