Early prediction for merged vs abandoned code changes in modern code reviews

Context: The modern code review process is an integral part of the current software development practice. Considerable effort is given here to inspect code changes, find defects, suggest an improvement, and address the suggestions of the reviewers. In a code review process, several iterations usually take place where an author submits code changes and a reviewer gives feedback until is happy to accept the change. In around 12% cases, the changes are abandoned, eventually wasting all the efforts. Objective: In this research, our objective is to design a tool that can predict whether a code change would be merged or abandoned at an early stage to reduce the waste of efforts of all stakeholders (e.g., program author, reviewer, project management, etc.) involved. The real-world demand for such a tool was formally identified by a study by Fan et al. [1]. Method: We have mined 146,612 code changes from the code reviews of three large and popular open-source software and trained and tested a suite of supervised machine learning classifiers, both shallow and deep learning-based. We consider a total of 25 features in each code change during the training and testing of the models. The features are divided into five dimensions: reviewer, author, project, text, and code. Results: The best performing model named PredCR (Predicting Code Review), a LightGBM-based classifier achieves around 85% AUC score on average and relatively improves the state-of-the-art [1] by 14-23%. In our extensive Preprint submitted to Information and Software Technology September 1, 2021 empirical study involving PredCR on the 146,612 code changes from the three software projects, we find that (1) The new features like reviewer dimensions that are introduced in PredCR are the most informative. (2) Compared to the baseline, PredCR is more effective towards reducing bias against new developers. (3) PredCR uses historical data in the code review repository and as such the performance of PredCR improves as a software system evolves with new and more data. Conclusion: PredCR can help save time and effort by helping developers/code reviewers to prioritize the code changes that they are asked to review. Project management can use PredCR to determine how code changes can be assigned to the code reviewers (e.g., select code changes that are more likely to be merged for review before the changes that might be abandoned).

[1]  Audris Mockus,et al.  Predicting risk of software changes , 2000, Bell Labs Technical Journal.

[2]  Witold Pedrycz,et al.  A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[3]  Tian Jiang,et al.  Personalized defect prediction , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[4]  Ahmed E. Hassan,et al.  Predicting faults using the complexity of code changes , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[5]  Michael W. Godfrey,et al.  Investigating code review quality: Do people and participation matter? , 2015, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[6]  Xiangping Chen,et al.  Would the Patch Be Quickly Merged? , 2019, BlockSys.

[7]  Gabriele Bavota,et al.  Four eyes are better than two: On the impact of code reviews on software quality , 2015, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[8]  Audris Mockus,et al.  A large-scale empirical study of just-in-time quality assurance , 2013, IEEE Transactions on Software Engineering.

[9]  Hajimu Iida,et al.  Review participation in modern code review , 2017, Empirical Software Engineering.

[10]  M. Braga,et al.  Exploratory Data Analysis , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[11]  Martin J. Shepperd,et al.  Using simulation to evaluate prediction techniques [for software] , 2001, Proceedings Seventh International Software Metrics Symposium.

[12]  Alberto Bacchelli,et al.  Expectations, outcomes, and challenges of modern code review , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[13]  Stephan Diehl,et al.  Small patches get in! , 2008, MSR '08.

[14]  Grzegorz Chrupala,et al.  Predicting the quality of questions on Stackoverflow , 2015, RANLP.

[15]  J. Friedman Stochastic gradient boosting , 2002 .

[16]  Tracy Hall,et al.  A Systematic Literature Review on Fault Prediction Performance in Software Engineering , 2012, IEEE Transactions on Software Engineering.

[17]  Martin Shepperd,et al.  Using Simulation to Evaluate Prediction Techniques , 2001 .

[18]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[19]  Ying Zou,et al.  Improving the pull requests review process using learning-to-rank algorithms , 2019, Empirical Software Engineering.

[20]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[21]  Song Wang,et al.  Leveraging Change Intents for Characterizing and Identifying Large-Review-Effort Changes , 2019, PROMISE.

[22]  Alberto Bacchelli,et al.  Code Review for Newcomers: Is It Different? , 2018, 2018 IEEE/ACM 11th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE).

[23]  Margaret-Anne D. Storey,et al.  Understanding broadcast based peer review on open source software projects , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[24]  Magne Jørgensen,et al.  A Systematic Review of Software Development Cost Estimation Studies , 2007, IEEE Transactions on Software Engineering.

[25]  Hajimu Iida,et al.  Mining the Modern Code Review Repositories: A Dataset of People, Process and Product , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[26]  Christian Bird,et al.  Characteristics of Useful Code Reviews: An Empirical Study at Microsoft , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[27]  Zeki Mazan,et al.  Will it pass? Predicting the outcome of a source code review , 2018 .

[28]  David Lo,et al.  ELBlocker: Predicting blocking bugs with ensemble imbalance learning , 2015, Inf. Softw. Technol..

[29]  Catarina Costa,et al.  TIPMerge: recommending developers for merging branches , 2016, SIGSOFT FSE.

[30]  Cor-Paul Bezemer,et al.  Revisiting the Performance Evaluation of Automated Approaches for the Retrieval of Duplicate Issue Reports , 2018, IEEE Transactions on Software Engineering.

[31]  Arie van Deursen,et al.  An exploratory study of the pull-based software development model , 2014, ICSE.

[32]  Igor Steinmacher,et al.  Effects of Adopting Code Review Bots on Pull Requests to OSS Projects , 2020, 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[33]  Andreas Zeller,et al.  Mining metrics to predict component failures , 2006, ICSE.

[34]  Elaine J. Weyuker,et al.  Does calling structure information improve the accuracy of fault prediction? , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[35]  Daniel M. Germán,et al.  Will my patch make it? And how fast? Case study on the Linux kernel , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[36]  Katsuro Inoue,et al.  Search-Based Peer Reviewers Recommendation in Modern Code Review , 2016, 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[37]  David Lo,et al.  Why is my code change abandoned? , 2019, Inf. Softw. Technol..

[38]  Michael W. Godfrey,et al.  The influence of non-technical factors on code review , 2013, 2013 20th Working Conference on Reverse Engineering (WCRE).

[39]  Premkumar T. Devanbu,et al.  Will They Like This? Evaluating Code Contributions with Language Models , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[40]  Bernd Bischl,et al.  Tunability: Importance of Hyperparameters of Machine Learning Algorithms , 2018, J. Mach. Learn. Res..

[41]  David Lo,et al.  Early prediction of merged code changes to prioritize reviewing tasks , 2018, Empirical Software Engineering.

[42]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[43]  Tracy Hall,et al.  Researcher Bias: The Use of Machine Learning in Software Defect Prediction , 2014, IEEE Transactions on Software Engineering.

[44]  Michael W. Godfrey,et al.  Investigating technical and non-technical factors influencing modern code review , 2015, Empirical Software Engineering.

[45]  Thomas Zimmermann,et al.  Improving Code Review by Predicting Reviewers and Acceptance of Patches , 2009 .

[46]  Abram Hindle,et al.  On the time-based conclusion stability of cross-project defect prediction models , 2019, Empirical Software Engineering.

[47]  Alberto Bacchelli,et al.  ETA: Estimated Time of Answer Predicting Response Time in Stack Overflow , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.