论文信息 - Does Code Quality Affect Pull Request Acceptance? An empirical study

Does Code Quality Affect Pull Request Acceptance? An empirical study

Background. Pull requests are a common practice for contributing and reviewing contributions, and are employed both in open-source and industrial contexts. One of the main goals of code reviews is to find defects in the code, allowing project maintainers to easily integrate external contributions into a project and discuss the code contributions. Objective. The goal of this paper is to understand whether code quality is actually considered when pull requests are accepted. Specifically, we aim at understanding whether code quality issues such as code smells, antipatterns, and coding style violations in the pull request code affect the chance of its acceptance when reviewed by a maintainer of the project. Method. We conducted a case study among 28 Java open-source projects, analyzing the presence of 4.7 M code quality issues in 36 K pull requests. We analyzed further correlations by applying Logistic Regression and seven machine learning techniques (Decision Tree, Random Forest, Extremely Randomized Trees, AdaBoost, Gradient Boosting, XGBoost). Results. Unexpectedly, code quality turned out not to affect the acceptance of a pull request at all. As suggested by other works, other factors such as the reputation of the maintainer and the importance of the feature delivered might be more important than code quality in terms of pull request acceptance. Conclusions. Researchers already investigated the influence of the developers' reputation and the pull request acceptance. This is the first work investigating if quality of the code in pull requests affects the acceptance of the pull request or not. We recommend that researchers further investigate this topic to understand if different measures or different tools could provide some useful measures.

[1] H. D. Rombach,et al. The Goal Question Metric Approach , 1994 .

[2] Alberto Bacchelli,et al. On the Impact of Design Flaws on Software Defects , 2010, 2010 10th International Conference on Quality Software.

[3] Premkumar T. Devanbu,et al. Wait for It: Determinants of Pull Request Evaluation Latency on GitHub , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[4] Michael W. Godfrey,et al. Studying Pull Request Merges: A Case Study of Shopify's Active Merchant , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[5] Tsuyoshi Murata,et al. {m , 1934, ACML.

[6] Alberto Sillitti,et al. A Survey on Code Analysis Tools for Software Maintenance Prediction , 2018, SEDA.

[7] Filippo Lanubile,et al. A large-scale, in-depth analysis of developers' personalities in the Apache ecosystem , 2019, Inf. Softw. Technol..

[8] James D. Herbsleb,et al. Social coding in GitHub: transparency and collaboration in an open software repository , 2012, CSCW.

[9] Nyyti Saarimäki. Methodological Issues in Observational Studies , 2019, SOEN.

[10] Terese Besker,et al. An Overview and Comparison of Technical Debt Measurement Tools , 2021, IEEE Software.

[11] Forrest Shull,et al. Inspecting the History of Inspections: An Example of Evidence-Based Technology Diffusion , 2008, IEEE Software.

[12] D. Cox. The Regression Analysis of Binary Sequences , 1958 .

[13] Premkumar T. Devanbu,et al. Will They Like This? Evaluating Code Contributions with Language Models , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[14] Anas N. Al-Rabadi,et al. A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[15] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[16] Stéphane Ducasse,et al. Object-Oriented Metrics in Practice , 2005 .

[17] J. Friedman. Greedy function approximation: A gradient boosting machine. , 2001 .

[18] Gabriele Bavota,et al. There and back again: Can you compile that snapshot? , 2017, J. Softw. Evol. Process..

[19] Forrest Shull,et al. Building empirical support for automated code smell detection , 2010, ESEM '10.

[20] Thomas J. Mowbray,et al. AntiPatterns: Refactoring Software, Architectures, and Projects in Crisis , 1998 .

[21] Pierre Geurts,et al. Extremely randomized trees , 2006, Machine Learning.

[22] Sousuke Amasaki,et al. Empirical Analysis of Fault-Proneness in Methods by Focusing on their Comment Lines , 2014, 2014 21st Asia-Pacific Software Engineering Conference.

[23] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[24] James D. Herbsleb,et al. Influence of social and technical factors for evaluating contribution in GitHub , 2014, ICSE.

[25] Eirini Kalliamvakou,et al. An in-depth study of the promises and perils of mining GitHub , 2016, Empirical Software Engineering.

[26] Per Runeson,et al. Guidelines for conducting and reporting case study research in software engineering , 2009, Empirical Software Engineering.

[27] Chanchal Kumar Roy,et al. CORRECT: Code Reviewer Recommendation in GitHub Based on Cross-Project and Technology Experience , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C).

[28] Hirohisa Aman. An Empirical Analysis on Fault-Proneness of Well-Commented Modules , 2012, 2012 Fourth International Workshop on Empirical Software Engineering in Practice.

[29] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[30] Chris F. Kemerer,et al. A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[31] Daniela Cruzes,et al. Are all code smells harmful? A study of God Classes and Brain Classes in the evolution of three open source systems , 2010, 2010 IEEE International Conference on Software Maintenance.

[32] Gabriele Bavota,et al. A Study on the Interplay between Pull Request Review and Continuous Integration Builds , 2019, 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[33] Anas Abdin,et al. Empirical Evaluation of the Impact of Object-Oriented Code Refactoring on Quality Attributes: A Systematic Literature Review , 2018, IEEE Transactions on Software Engineering.

[34] Chanchal Kumar Roy,et al. An insight into the pull requests of GitHub , 2014, MSR 2014.

[35] Davide Taibi,et al. On the diffuseness of code technical debt in Java projects of the apache ecosystem , 2019, TechDebt@ICSE.

[36] Gang Yin,et al. Reviewer Recommender of Pull-Requests in GitHub , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[37] Audris Mockus,et al. Quantifying the Effect of Code Smells on Maintenance Effort , 2013, IEEE Transactions on Software Engineering.

[38] Terese Besker,et al. A systematic literature review on Technical Debt prioritization: Strategies, processes, factors, and tools , 2021, J. Syst. Softw..

[39] Xin Zhang,et al. How do Multiple Pull Requests Change the Same Code: A Study of Competing Pull Requests in GitHub , 2018, 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[40] Maria Teresa Baldassarre,et al. On the Accuracy of SonarQube Technical Debt Remediation Time , 2019, 2019 45th Euromicro Conference on Software Engineering and Advanced Applications (SEAA).

[41] Georgios Gousios,et al. Automatically Prioritizing Pull Requests , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[42] Priscilla J. Fowler,et al. Software inspections and the industrial production of software , 1984 .

[43] Dror G. Feitelson,et al. Development and Deployment at Facebook , 2013, IEEE Internet Computing.

[44] Yann-Gaël Guéhéneuc,et al. DECOR: A Method for the Specification and Detection of Code and Design Smells , 2010, IEEE Transactions on Software Engineering.

[45] Arie van Deursen,et al. An exploratory study of the pull-based software development model , 2014, ICSE.

[46] Leo Breiman,et al. Classification and Regression Trees , 1984 .

[47] Georgios Gousios,et al. A dataset for pull-based development research , 2014, MSR 2014.

[48] D. Taibi,et al. Some SonarQube Issues have a Significant but SmallEffect on Faults and Changes. A large-scale empirical study , 2019, J. Syst. Softw..

[49] Margaret-Anne D. Storey,et al. Understanding broadcast based peer review on open source software projects , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[50] Josh Levenberg,et al. Why Google stores billions of lines of code in a single repository , 2016, Commun. ACM.

[51] Raed Shatnawi,et al. An empirical study of the bad smells and class error probability in the post-release object-oriented system evolution , 2007, J. Syst. Softw..

[52] Gabriele Bavota,et al. How Developers Document Pull Requests with External References , 2017, 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC).

[53] Forrest Shull,et al. Investigating the impact of design debt on software quality , 2011, MTD '11.

[54] Christian Bird,et al. Diversity in software engineering research , 2013, ESEC/FSE 2013.

[55] Emerson Murphy-Hill,et al. Gender differences and bias in open source: pull request acceptance of women versus men , 2017, PeerJ Comput. Sci..

[56] M. Patton,et al. Qualitative evaluation and research methods , 1992 .

[57] Baowen Xu,et al. How does code style inconsistency affect pull request integration? An exploratory study on 117 GitHub projects , 2019, Empirical Software Engineering.

[58] Daniela Cruzes,et al. The evolution and impact of code smells: A case study of two open source systems , 2009, 2009 3rd International Symposium on Empirical Software Engineering and Measurement.

[59] Georgios Gousios,et al. Work practices and challenges in pull-based development: the contributor's perspective , 2015, ICSE.

[60] Michael Fagan. Design and Code Inspections to Reduce Errors in Program Development , 1976, IBM Syst. J..

[61] Alberto Bacchelli,et al. Expectations, outcomes, and challenges of modern code review , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[62] Davide Taibi,et al. The Technical Debt Dataset , 2019, PROMISE.

[63] Tianqi Chen,et al. XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[64] Foutse Khomh,et al. An Exploratory Study of the Impact of Code Smells on Software Change-proneness , 2009, 2009 16th Working Conference on Reverse Engineering.

[65] David M. W. Powers,et al. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[66] Ward Cunningham,et al. The WyCash portfolio management system , 1992, OOPSLA '92.

[67] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.

[68] Martin Fowler,et al. Refactoring - Improving the Design of Existing Code , 1999, Addison Wesley object technology series.

[69] Heikki Huttunen,et al. Are SonarQube Rules Inducing Bugs? , 2019, 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[70] Leonardo Gresta Paulino Murta,et al. Acceptance factors of pull requests in open-source projects , 2015, SAC.

[71] Leonardo Gresta Paulino Murta,et al. Rejection Factors of Pull Requests Filed by Core Team Developers in Software Projects with High Acceptance Rates , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[72] Andrew P. Bradley,et al. The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[73] Georgios Gousios,et al. Relationship between geographical location and evaluation of developer contributions in github , 2018, ESEM.

[74] A. Frank Ackerman,et al. Software inspections: an effective verification process , 1989, IEEE Software.

[75] Nicole Novielli,et al. A Preliminary Analysis on the Effects of Propensity to Trust in Distributed Software Development , 2017, 2017 IEEE 12th International Conference on Global Software Engineering (ICGSE).

[76] Andy Zaidman,et al. Analyzing the State of Static Analysis: A Large-Scale Evaluation in Open Source Software , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[77] Maria Teresa Baldassarre,et al. On the diffuseness of technical debt items and accuracy of remediation time when using SonarQube , 2020, Inf. Softw. Technol..

[78] Leo Breiman,et al. Random Forests , 2001, Machine Learning.

[79] D. Powers. Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness & Correlation , 2008 .

[80] Aiko Yamashita,et al. Assessing the capability of code smells to explain maintenance problems: an empirical study combining quantitative and qualitative data , 2013, Empirical Software Engineering.

[81] Francesca Arcelli Fontana,et al. Impact of refactoring on quality code evaluation , 2011, WRT '11.

[82] Foutse Khomh,et al. Evaluating the impact of design pattern and anti-pattern dependencies on changes and faults , 2015, Empirical Software Engineering.

[83] Gabriele Bavota,et al. On the diffuseness and the impact on maintainability of code smells: a large scale empirical investigation , 2018, Empirical Software Engineering.

[84] Georgios Gousios,et al. Work Practices and Challenges in Pull-Based Development: The Integrator's Perspective , 2014, ICSE.

[85] Karen A. Loveland,et al. LARGE SCALE , 1991 .

[86] Jens Grabowski,et al. A longitudinal study of static analysis warning evolution and the effects of PMD on software quality in Apache open source projects , 2019, Empirical Software Engineering.

[87] Antonio Martini,et al. Towards surgically-precise technical debt estimation: early results and research roadmap , 2019, MaLTeSQuE@ESEC/SIGSOFT FSE.

[88] Daniel M. Germán,et al. Contemporary Peer Review in Action: Lessons from Open Source Development , 2012, IEEE Software.

[89] Davide Taibi,et al. How developers perceive smells in source code: A replicated study , 2017, Inf. Softw. Technol..

[90] Tom Mens,et al. Does God Class Decomposition Affect Comprehensibility? , 2006, IASTED Conf. on Software Engineering.