On the differences between quality increasing and other changes in open source Java projects

Many software metrics are designed to measure aspects that are believed to be related to software quality. Static software metrics, e.g., size, complexity and coupling are used in defect prediction research as well as software quality models to evaluate software quality. Static analysis tools also include boundary values for complexity and size that generate warnings for developers. While this indicates a relationship between quality and software metrics, the extend of it is not well understood. Moreover, recent studies found that complexity metrics may be unreliable indicators for understandability of the source code. To explore this relationship, we leverage the intent of developers about what constitutes a quality improvement in their own code base. We manually classify a randomized sample of 2,533 commits from 54 Java open source projects as quality improving depending on the intent of the developer by inspecting the commit message. We distinguish between perfective and corrective maintenance via predefined guidelines and use this data as ground truth for the fine-tuning of a state-of-the art deep learning model for natural language processing. The benchmark we provide with our ground truth indicates that the deep learning model can be confidently used for commit intent classification. We use the model to increase our data set to 125,482 Alexander Trautsch Institute of Computer Science, University of Goettingen, Germany E-mail: alexander.trautsch@cs.uni-goettingen.de Johannes Erbel Institute of Computer Science, University of Goettingen, Germany E-mail: johannes.erbel@cs.uni-goettingen.de Steffen Herbold Institute of Software and Systems Engineering, TU Clausthal, Germany E-mail: steffen.herbold@tu-clausthal.de Jens Grabowski Institute of Computer Science, University of Goettingen, Germany E-mail: grabowski@cs.uni-goettingen.de ar X iv :2 10 9. 03 54 4v 3 [ cs .S E ] 1 8 N ov 2 02 1 2 Alexander Trautsch et al. commits. Based on the resulting data set, we investigate the differences in size and 14 static source code metrics between changes that increase quality and other changes. In addition, we investigate which files are targets of quality improvements. We find that quality improving commits are smaller than other commits. Perfective changes have a positive impact on static source code metrics while corrective changes do tend to add complexity. Furthermore, we find that files which are the target of perfective maintenance already have a lower median complexity than other files. Our study results provide empirical evidence for which static source code metrics capture quality improvement from the developers point of view. This has implications for program understanding as well as code smell detection and recommender systems.

[1]  Song Wang,et al.  Large-scale intent analysis for identifying large-review-effort code changes , 2021, Inf. Softw. Technol..

[2]  Audris Mockus,et al.  Identifying reasons for software changes using historic databases , 2000, Proceedings 2000 International Conference on Software Maintenance.

[3]  Rudolf Ferenc,et al.  Qualitygate SourceAudit: A tool for assessing the technical quality of software , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[4]  Rudolf Ferenc,et al.  An Automatically Created Novel Bug Dataset and its Validation in Bug Prediction , 2020, J. Syst. Softw..

[5]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[6]  Morgan Ericsson,et al.  Importance and Aptitude of Source Code Density for Commit Classification into Maintenance Activities , 2019, 2019 IEEE 19th International Conference on Software Quality, Reliability and Security (QRS).

[7]  Thomas Grechenig,et al.  Tracing Your Maintenance Work - A Cross-Project Validation of an Automated Classification Dictionary for Commit Messages , 2012, FASE.

[8]  Audris Mockus,et al.  A large-scale empirical study of just-in-time quality assurance , 2013, IEEE Transactions on Software Engineering.

[9]  Burak Turhan,et al.  Implications of ceiling effects in defect predictors , 2008, PROMISE '08.

[10]  Chakkrit Tantithamthavorn,et al.  Mining Software Defects: Should We Consider Affected Releases? , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[11]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[12]  E. Burton Swanson,et al.  The dimensions of maintenance , 1976, ICSE '76.

[13]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[14]  S. Shapiro,et al.  An Analysis of Variance Test for Normality (Complete Samples) , 1965 .

[15]  Gabriele Bavota,et al.  An experimental investigation on the innate relationship between quality and refactoring , 2015, J. Syst. Softw..

[16]  Burak Turhan,et al.  A Systematic Literature Review and Meta-Analysis on Cross Project Defect Prediction , 2019, IEEE Transactions on Software Engineering.

[17]  Venera Arnaoudova,et al.  Improving Source Code Readability: Theory and Practice , 2019, 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC).

[18]  Dewayne E. Perry,et al.  Toward understanding the rhetoric of small source code changes , 2005, IEEE Transactions on Software Engineering.

[19]  Claes Wohlin,et al.  Experimentation in software engineering: an introduction , 2000 .

[20]  Mohamed Wiem Mkaouer,et al.  Toward the Automatic Classification of Self-Affirmed Refactoring , 2020, J. Syst. Softw..

[21]  David Lo,et al.  Supervised vs Unsupervised Models: A Holistic Look at Effort-Aware Just-in-Time Defect Prediction , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[22]  J. R. Landis,et al.  An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. , 1977, Biometrics.

[23]  Michele Lanza,et al.  Evaluating defect prediction approaches: a benchmark and an extensive comparison , 2011, Empirical Software Engineering.

[24]  Fabian Trautsch,et al.  Addressing problems with replicability and validity of repository mining studies through a smart data platform , 2018, Empirical Software Engineering.

[25]  Software metrics a rigorous and practical approach pdf , 2015 .

[26]  Jens Grabowski,et al.  A longitudinal study of static analysis warning evolution and the effects of PMD on software quality in Apache open source projects , 2019, Empirical Software Engineering.

[27]  Gabriele Bavota,et al.  Improving Code: The (Mis) Perception of Quality Metrics , 2018, 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[28]  Ling Xu,et al.  Automatically classifying software changes via discriminative topic model: Supporting multi-category and cross-project , 2016, J. Syst. Softw..

[29]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[30]  Alessandro F. Garcia,et al.  How does refactoring affect internal quality attributes?: A multi-project study , 2017, SBES'17.

[31]  Mohammad Alshayeb,et al.  Empirical investigation of refactoring effect on software quality , 2009, Inf. Softw. Technol..

[32]  Alexander Trautsch,et al.  On the validity of pre-trained transformers for natural language processing in the software engineering domain , 2021, ArXiv.

[33]  Jonathan I. Maletic,et al.  What's a Typical Commit? A Characterization of Open Source Software Repositories , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[34]  Yuming Zhou,et al.  How Far We Have Progressed in the Journey? An Examination of Cross-Project Defect Prediction , 2018, ACM Trans. Softw. Eng. Methodol..

[35]  Mohamed Wiem Mkaouer,et al.  Augmenting commit classification by using fine-grained source code changes and a pre-trained deep neural language model , 2021, Inf. Softw. Technol..

[36]  Mohamed Wiem Mkaouer,et al.  On the classification of software change messages using multi-label active learning , 2019, SAC.

[37]  Lech Madeyski,et al.  Towards identifying software project clusters with regard to defect prediction , 2010, PROMISE '10.

[38]  Anas Abdin,et al.  Empirical Evaluation of the Impact of Object-Oriented Code Refactoring on Quality Attributes: A Systematic Literature Review , 2018, IEEE Transactions on Software Engineering.

[39]  S. Herbold,et al.  Issues with SZZ: An empirical assessment of the state of practice of defect prediction data collection , 2019, ArXiv.

[40]  Barry W. Boehm,et al.  Quantitative evaluation of software quality , 1976, ICSE '76.

[41]  Amiram Yehudai,et al.  Boosting Automatic Commit Classification Into Maintenance Activities By Utilizing Source Code Changes , 2017, PROMISE.

[42]  R. Grissom,et al.  Effect sizes for research: A broad practical approach. , 2005 .

[43]  Andreas Zeller,et al.  It's not a bug, it's a feature: How misclassification impacts bug prediction , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[44]  Diomidis Spinellis,et al.  Refactoring--Does It Improve Software Quality? , 2007, Fifth International Workshop on Software Quality (WoSQ'07: ICSE Workshops 2007).

[45]  Sven Apel,et al.  Program Comprehension and Code Complexity Metrics: An fMRI Study , 2021, 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE).

[46]  Fabian Trautsch,et al.  The SmartSHARK Ecosystem for Software Repository Mining , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion).

[47]  Tibor Gyimóthy,et al.  A probabilistic software quality model , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[48]  N. Cliff Dominance statistics: Ordinal analyses to answer ordinal questions. , 1993 .

[49]  Gabriele Bavota,et al.  Automatically Assessing Code Understandability , 2019, IEEE Transactions on Software Engineering.

[50]  Ying Fu,et al.  Automated classification of software change messages by semi-supervised Latent Dirichlet Allocation , 2015, Inf. Softw. Technol..

[51]  P. A. Richards,et al.  Factors in software quality: concept and definitions of software quality , 1977 .

[52]  Anna Rita Fasolino,et al.  Lo Standard ISO/IEC 9126 – Software engineering – Product Quality , 2010 .

[53]  Gabriele Bavota,et al.  Why Developers Refactor Source Code: A Mining-based Study , 2021, ArXiv.

[54]  Michele Lanza,et al.  On the nature of commits , 2008, 2008 23rd IEEE/ACM International Conference on Automated Software Engineering - Workshops.

[55]  Stéphane Ducasse,et al.  The squale model — A practice-based industrial quality model , 2009, 2009 IEEE International Conference on Software Maintenance.

[56]  Reinhold Plösch,et al.  The Quamoco product quality modelling and assessment approach , 2012, 2012 34th International Conference on Software Engineering (ICSE).