On the Shoulders of Giants: A New Dataset for Pull-based Development Research

Pull-based development is a widely adopted paradigm for collaboration in distributed software development, attracting eyeballs from both academic and industry. To better study pull-based development model, this paper presents a new dataset containing 96 features collected from 11,230 projects and 3,347,937 pull requests. We describe the creation process and explain the features in details. To the best of our knowledge, our dataset is the most comprehensive and largest one toward a complete picture for pull-based development research.

[1]  Premkumar T. Devanbu,et al.  Gender and Tenure Diversity in GitHub Teams , 2015, CHI.

[2]  Georgios Gousios,et al.  The GHTorent dataset and tool suite , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[3]  Premkumar T. Devanbu,et al.  Quality and productivity outcomes relating to continuous integration in GitHub , 2015, ESEC/SIGSOFT FSE.

[4]  Premkumar T. Devanbu,et al.  Developer onboarding in GitHub: the role of prior social links and language experience , 2015, ESEC/SIGSOFT FSE.

[5]  Rahul N. Iyer Effects of Personality Traits and Emotional Factors in Pull Request Acceptance. , 2019 .

[6]  Gang Yin,et al.  Who Should Review this Pull-Request: Reviewer Recommendation to Expedite Crowd Collaboration , 2014, 2014 21st Asia-Pacific Software Engineering Conference.

[7]  DongGyun Han,et al.  Writing Acceptable Patches: An Empirical Study of Open Source Project Patches , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[8]  Ayushi Rastogi,et al.  Do Biases Related to Geographical Location Influence Work-Related Decisions in GitHub? , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C).

[9]  Jesse Hoey,et al.  Effects of Personality Traits on Pull Request Acceptance , 2021, IEEE Transactions on Software Engineering.

[10]  Nicole Novielli,et al.  A Preliminary Analysis on the Effects of Propensity to Trust in Distributed Software Development , 2017, 2017 IEEE 12th International Conference on Global Software Engineering (ICGSE).

[11]  Audris Mockus,et al.  Effectiveness of code contribution: from patch-based to pull-request-based tools , 2016, SIGSOFT FSE.

[12]  Nikhil Khadke,et al.  Predicting Acceptance of GitHub Pull Requests , .

[13]  Georgios Gousios,et al.  A dataset for pull-based development research , 2014, MSR 2014.

[14]  Jeffrey C. Carver,et al.  Impact of developer reputation on code review outcomes in OSS projects: an empirical investigation , 2014, ESEM '14.

[15]  James D. Herbsleb,et al.  Influence of social and technical factors for evaluating contribution in GitHub , 2014, ICSE.

[16]  Gang Yin,et al.  Determinants of pull-based development in the context of continuous integration , 2016, Science China Information Sciences.

[17]  Georgios Gousios,et al.  Work Practices and Challenges in Pull-Based Development: The Integrator's Perspective , 2014, ICSE.

[18]  Georgios Gousios,et al.  Relationship between geographical location and evaluation of developer contributions in github , 2018, ESEM.

[19]  Daniel M. Germán,et al.  Will my patch make it? And how fast? Case study on the Linux kernel , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[20]  Emerson Murphy-Hill,et al.  Gender differences and bias in open source: pull request acceptance of women versus men , 2017, PeerJ Comput. Sci..

[21]  Igor Steinmacher,et al.  Who Gets a Patch Accepted First? Comparing the Contributions of Employees and Volunteers , 2018, 2018 IEEE/ACM 11th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE).

[22]  Leonardo Gresta Paulino Murta,et al.  Acceptance factors of pull requests in open-source projects , 2015, SAC.

[23]  Leonardo Gresta Paulino Murta,et al.  Rejection Factors of Pull Requests Filed by Core Team Developers in Software Projects with High Acceptance Rates , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[24]  Michael W. Godfrey,et al.  The Secret Life of Patches: A Firefox Case Study , 2012, 2012 19th Working Conference on Reverse Engineering.

[25]  Gabriele Bavota,et al.  A Study on the Interplay between Pull Request Review and Continuous Integration Builds , 2019, 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[26]  Chanchal Kumar Roy,et al.  An insight into the pull requests of GitHub , 2014, MSR 2014.

[27]  Alexander Serebrenik,et al.  Continuous Integration in a Social-Coding World: Empirical Evidence from GitHub , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[28]  Rohan Padhye,et al.  A study of external community contribution to open-source projects on GitHub , 2014, MSR 2014.

[29]  Michael W. Godfrey,et al.  The influence of non-technical factors on code review , 2013, 2013 20th Working Conference on Reverse Engineering (WCRE).

[30]  Arie van Deursen,et al.  An exploratory study of the pull-based software development model , 2014, ICSE.

[31]  Michael W. Godfrey,et al.  Studying Pull Request Merges: A Case Study of Shopify's Active Merchant , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).