Which Pull Requests Get Accepted and Why? A study of popular NPM Packages

Background: Pull Request (PR) Integrators often face challenges in terms of multiple concurrent PRs, so the ability to gauge which of the PRs will get accepted can help them balance their workload. PR creators would benefit from knowing if certain characteristics of their PRs may increase the chances of acceptance. Aim: We modeled the probability that a PR will be accepted within a month after creation using a Random Forest model utilizing 50 predictors representing properties of the author, PR, and the project to which PR is submitted. Method: 483,988 PRs from 4218 popular NPM packages were analysed and we selected a subset of 14 predictors sufficient for a tuned Random Forest model to reach high accuracy. Result: An AUC-ROC value of 0.95 was achieved predicting PR acceptance. The model excluding PR properties that change after submission gave an AUC-ROC value of 0.89. We tested the utility of our model in practical scenarios by training it with historical data for the NPM package \textit{bootstrap} and predicting if the PRs submitted in future will be accepted. This gave us an AUC-ROC value of 0.94 with all 14 predictors, and 0.77 excluding PR properties that change after its creation. Conclusion: PR integrators can use our model for a highly accurate assessment of the quality of the open PRs and PR creators may benefit from the model by understanding which characteristics of their PRs may be undesirable from the integrators' perspective. The model can be implemented as a tool, which we plan to do as a future work.

[1]  Georgios Gousios,et al.  The GHTorent dataset and tool suite , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[2]  Georgios Gousios,et al.  Automatically Prioritizing Pull Requests , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[3]  Chanchal Kumar Roy,et al.  An insight into the pull requests of GitHub , 2014, MSR 2014.

[4]  J. Herbsleb,et al.  Two case studies of open source software development: Apache and Mozilla , 2002, TSEM.

[5]  Audris Mockus,et al.  Effectiveness of code contribution: from patch-based to pull-request-based tools , 2016, SIGSOFT FSE.

[6]  Gang Yin,et al.  Who Should Review this Pull-Request: Reviewer Recommendation to Expedite Crowd Collaboration , 2014, 2014 21st Asia-Pacific Software Engineering Conference.

[7]  Georgios Gousios,et al.  Work practices and challenges in pull-based development: the contributor's perspective , 2015, ICSE.

[8]  Eleni Constantinou,et al.  On the Impact of Security Vulnerabilities in the npm Package Dependency Network , 2018, 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR).

[9]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[10]  James D. Herbsleb,et al.  Influence of social and technical factors for evaluating contribution in GitHub , 2014, ICSE.

[11]  Takashi Ishio,et al.  Towards Smoother Library Migrations: A Look at Vulnerable Dependency Migrations at Function Level for npm JavaScript Packages , 2018, 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[12]  Daniel M. Germán,et al.  Will my patch make it? And how fast? Case study on the Linux kernel , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[13]  James D. Herbsleb,et al.  Social coding in GitHub: transparency and collaboration in an open software repository , 2012, CSCW.

[14]  Philippe Suter,et al.  A Look at the Dynamics of the JavaScript Package Ecosystem , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[15]  Premkumar T. Devanbu,et al.  Wait for It: Determinants of Pull Request Evaluation Latency on GitHub , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[16]  Jia-Huan He,et al.  Who should comment on this pull request? Analyzing attributes for more accurate commenter recommendation in pull-based development , 2017, Inf. Softw. Technol..

[17]  Gang Yin,et al.  Reviewer Recommender of Pull-Requests in GitHub , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[18]  Audris Mockus,et al.  Patterns of Effort Contribution and Demand and User Classification based on Participation Patterns in NPM Ecosystem , 2019, PROMISE.

[19]  Audris Mockus,et al.  World of Code: An Infrastructure for Mining the Universe of Open Source VCS Data , 2019, 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR).

[20]  Audris Mockus,et al.  Are Software Dependency Supply Chain Metrics Useful in Predicting Change of Popularity of NPM Packages? , 2018, PROMISE.

[21]  Audris Mockus,et al.  Impact of Triage: A Study of Mozilla and Gnome , 2013, 2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement.

[22]  Leonardo Gresta Paulino Murta,et al.  Acceptance factors of pull requests in open-source projects , 2015, SAC.

[23]  Michael W. Godfrey,et al.  The Secret Life of Patches: A Firefox Case Study , 2012, 2012 19th Working Conference on Reverse Engineering.

[24]  Daniel M. Germán,et al.  Peer Review on Open-Source Software Projects: Parameters, Statistical Models, and Theory , 2014, TSEM.

[25]  Minghui Zhou,et al.  Be careful of when: an empirical study on time-related misuse of issue tracking data , 2018, ESEC/SIGSOFT FSE.

[26]  Stephan Diehl,et al.  Small patches get in! , 2008, MSR '08.

[27]  Gang Yin,et al.  Reviewer recommendation for pull-requests in GitHub: What can we learn from code review and bug assignment? , 2016, Inf. Softw. Technol..

[28]  Arie van Deursen,et al.  An exploratory study of the pull-based software development model , 2014, ICSE.

[29]  Marco Tulio Valente,et al.  Understanding the Factors That Impact the Popularity of GitHub Repositories , 2016, 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME).