A dataset for pull-based development research

Pull requests form a new method for collaborating in distributed software development. To study the pull request distributed development model, we constructed a dataset of almost 900 projects and 350,000 pull requests, including some of the largest users of pull requests on Github. In this paper, we describe how the project selection was done, we analyze the selected features and present a machine learning tool set for the R statistics environment.

[1]  N. Nagappan,et al.  Use of relative code churn measures to predict system defect density , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[2]  Gail C. Murphy,et al.  Who should fix this bug? , 2006, ICSE.

[3]  Premkumar T. Devanbu,et al.  Open Borders? Immigration in Open Source Projects , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[4]  Stephan Diehl,et al.  Small patches get in! , 2008, MSR '08.

[5]  Thomas Zimmermann,et al.  Improving Code Review by Predicting Reviewers and Acceptance of Patches , 2009 .

[6]  Harald C. Gall,et al.  Predicting the fix time of bugs , 2010, RSSE '10.

[7]  Michael W. Godfrey,et al.  Mining usage data and development artifacts , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[8]  James D. Herbsleb,et al.  Social coding in GitHub: transparency and collaboration in an open software repository , 2012, CSCW.

[9]  Christian Bird,et al.  Gerrit software code review data from Android , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[10]  Georgios Gousios,et al.  The GHTorent dataset and tool suite , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[11]  Leif Singer,et al.  Creating a shared understanding of testing culture on a social coding site , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[12]  Christian Bird,et al.  Convergent contemporary software peer review practices , 2013, ESEC/FSE 2013.

[13]  Hajimu Iida,et al.  Who does what during a code review? Datasets of OSS peer review repositories , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[14]  Christian Bird,et al.  Convergent Software Peer Review Practices , 2013 .

[15]  Arie van Deursen,et al.  An exploratory study of the pull-based software development model , 2014, ICSE.