Predicting Which Pull Requests Will Get Reopened in GitHub

In GitHub, integrators inspect submitted code changes, make evaluation decision, and close pull requests. However, some pull requests may get reopened for further modification and code review. It is important to predict reopened pull requests immediately after pull requests' first close, and help integrators reopen pull requests in time. If pull requests are reopened a long time after their close, they may cause conflicts with newly submitted pull requests, add software maintenance cost, and increase burden for already busy developers. To the best of our knowledge, we present the first look at predicting reopened pull requests in GitHub. We propose an approach DTPre which is an automatic predictor of reopened pull requests based on Decision Tree classifier. DTPre mainly analyzes code features of modified changes, review features during evaluation, and developer feature of contributors. We evaluate the effectiveness of DTPre on 7 Open Source projects containing 100,622 pull requests. Experimental results show that DTPre has high performances by achieving a precision of 95.53%, recall of 99.01% and F1-measure of 97.23% on average. In comparison with predictors based on neural network, naïve Bayes, logistic regression and SVM, DTPre based on decision tree improves F-1 measures by 41.76%, 59.45%, 42.25% and 9.98% on average across 7 projects.

[1]  Ken-ichi Matsumoto,et al.  Predicting Re-opened Bugs: A Case Study on the Eclipse Project , 2010, 2010 17th Working Conference on Reverse Engineering.

[2]  Dietmar Pfahl,et al.  Using Dynamic and Contextual Features to Predict Issue Lifetime in GitHub Projects , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[3]  Premkumar T. Devanbu,et al.  Quality and productivity outcomes relating to continuous integration in GitHub , 2015, ESEC/SIGSOFT FSE.

[4]  Shane McKee,et al.  Software Practitioner Perspectives on Merge Conflicts and Resolutions , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[5]  David Lo,et al.  A Comparative Study of Supervised Learning Algorithms for Re-opened Bug Prediction , 2013, CSMR 2013.

[6]  David Lo,et al.  Understanding inactive yet available assignees in GitHub , 2017, Inf. Softw. Technol..

[7]  Audris Mockus,et al.  Effectiveness of code contribution: from patch-based to pull-request-based tools , 2016, SIGSOFT FSE.

[8]  James D. Herbsleb,et al.  Social coding in GitHub: transparency and collaboration in an open software repository , 2012, CSCW.

[9]  Matthew A. Russell,et al.  Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Google+, GitHub, and More , 2018 .

[10]  Jia-Huan He,et al.  Who should comment on this pull request? Analyzing attributes for more accurate commenter recommendation in pull-based development , 2017, Inf. Softw. Technol..

[11]  Gang Yin,et al.  Reviewer Recommender of Pull-Requests in GitHub , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[12]  Jia-Huan He,et al.  CoreDevRec: Automatic Core Member Recommendation for Contribution Evaluation , 2015, Journal of Computer Science and Technology.

[13]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[14]  Foutse Khomh,et al.  Supplementary Bug Fixes vs. Re-opened Bugs , 2014, 2014 IEEE 14th International Working Conference on Source Code Analysis and Manipulation.

[15]  James D. Herbsleb,et al.  Influence of social and technical factors for evaluating contribution in GitHub , 2014, ICSE.

[16]  David Lo,et al.  Automatic, high accuracy prediction of reopened bugs , 2014, Automated Software Engineering.

[17]  Xiaoguang Mao,et al.  An Empirical Study on Interaction Factors Influencing Bug Reopenings , 2014, 2014 21st Asia-Pacific Software Engineering Conference.

[18]  Ken-ichi Matsumoto,et al.  Studying re-opened bugs in open source software , 2012, Empirical Software Engineering.

[19]  David Lo,et al.  Accurate developer recommendation for bug resolution , 2013, 2013 20th Working Conference on Reverse Engineering (WCRE).

[20]  Philip J. Guo,et al.  Characterizing and predicting which bugs get reopened , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[21]  Georgios Gousios,et al.  Work practices and challenges in pull-based development: the contributor's perspective , 2015, ICSE.

[22]  Arie van Deursen,et al.  An exploratory study of the pull-based software development model , 2014, ICSE.

[23]  Jacky W. Keung,et al.  An empirical analysis of reopened bugs based on open source projects , 2016, EASE.

[24]  Georgios Gousios,et al.  Work Practices and Challenges in Pull-Based Development: The Integrator's Perspective , 2014, ICSE.

[25]  Aditya K. Ghose,et al.  Predicting the delay of issues with due dates in software projects , 2017, Empirical Software Engineering.