On the Fault Proneness of SonarQube Technical Debt Violations: A comparison of eight Machine Learning Techniques

Background. The popularity of tools for analyzing Technical Debt, and particularly that of SonarQube, is increasing rapidly. SonarQube proposes a set of coding rules, which represent something wrong in the code that will soon be reflected in a fault or will increase maintenance effort. However, while the management of some companies is encouraging developers not to violate these rules in the first place and to produce code below a certain technical debt threshold, developers are skeptical of their importance. Objective. In order to understand which SonarQube violations are actually fault-prone and to analyze the accuracy of the fault-prediction model, we designed and conducted an empirical study on 21 well-known mature open-source projects. Method. We applied the SZZ algorithm to label the fault-inducing commits. We compared the classification power of eight Machine Learning models (Logistic Regression, Decision Tree, Random Forest, Extremely Randomized Trees, AdaBoost, Gradient Boosting, XGBoost) to obtain a set of violations that are correlated with fault-inducing commits. Finally, we calculated the percentage of violations introduced in the fault-inducing commit and removed in the fault-fixing commit, so as to reduce the risk of spurious correlations. Result. Among the 202 violations defined for Java by SonarQube, only 26 have a relatively low fault-proneness. Moreover, violations classified as ''bugs'' by SonarQube hardly never become a failure. Consequently, the accuracy of the fault-prediction power proposed by SonarQube is extremely low. Conclusion. The rules applied by SonarQube for calculating technical debt should be thoroughly investigated and their harmfulness needs to be further confirmed. Therefore, companies should carefully consider which rules they really need to apply, especially if their goal is to reduce fault-proneness.

[1]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[2]  Jens Grabowski,et al.  A Comparative Study to Benchmark Cross-Project Defect Prediction Approaches , 2018, IEEE Transactions on Software Engineering.

[3]  Davide Falessi,et al.  What if I Had No Smells? , 2017, 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).

[4]  Andrea De Lucia,et al.  Cross-project defect prediction models: L'Union fait la force , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[5]  Audris Mockus,et al.  Quantifying the Effect of Code Smells on Maintenance Effort , 2013, IEEE Transactions on Software Engineering.

[6]  Davide Taibi,et al.  How developers perceive smells in source code: A replicated study , 2017, Inf. Softw. Technol..

[7]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[8]  D. Cox The Regression Analysis of Binary Sequences , 1958 .

[9]  Alessandro F. Garcia,et al.  JSpIRIT: a flexible tool for the analysis of code smells , 2015, 2015 34th International Conference of the Chilean Computer Science Society (SCCC).

[10]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[11]  Per Runeson,et al.  Guidelines for conducting and reporting case study research in software engineering , 2009, Empirical Software Engineering.

[12]  Aiko Fallas Yamashita,et al.  To what extent can maintenance problems be predicted by code smell detection? - An empirical study , 2013, Inf. Softw. Technol..

[13]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[14]  Apostolos Ampatzoglou,et al.  How do developers fix issues and pay back technical debt in the Apache ecosystem? , 2018, 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[15]  Andrea De Lucia,et al.  Detecting code smells using machine learning techniques: Are we there yet? , 2018, 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[16]  Yann-Gaël Guéhéneuc,et al.  SMURF: A SVM-based Incremental Anti-pattern Detection Approach , 2012, 2012 19th Working Conference on Reverse Engineering.

[17]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[18]  Ioannis Stamelos,et al.  A controlled experiment investigation of an object-oriented design heuristic for maintainability , 2004, J. Syst. Softw..

[19]  Martin Pinzger,et al.  Method-level bug prediction , 2012, Proceedings of the 2012 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement.

[20]  Marouane Kessentini,et al.  Detecting Android Smells Using Multi-Objective Genetic Programming , 2017, 2017 IEEE/ACM 4th International Conference on Mobile Software Engineering and Systems (MOBILESoft).

[21]  Francesca Arcelli Fontana,et al.  Toward a Smell-Aware Bug Prediction Model , 2019, IEEE Transactions on Software Engineering.

[22]  Jesús M. González-Barahona,et al.  Reproducibility and credibility in empirical software engineering: A case study based on a systematic literature review of the use of the SZZ algorithm , 2018, Inf. Softw. Technol..

[23]  Francesca Arcelli Fontana,et al.  Code smell severity classification using machine learning techniques , 2017, Knowl. Based Syst..

[24]  Cyrus Shahabi,et al.  Feature subset selection and feature ranking for multivariate time series , 2005, IEEE Transactions on Knowledge and Data Engineering.

[25]  M. Patton,et al.  Qualitative evaluation and research methods , 1992 .

[26]  Ahmed E. Hassan,et al.  Predicting faults using the complexity of code changes , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[27]  Francesca Arcelli Fontana,et al.  Change Prediction through Coding Rules Violations , 2017, EASE.

[28]  Gabriele Bavota,et al.  A Developer Centered Bug Prediction Model , 2018, IEEE Transactions on Software Engineering.

[29]  Davide Falessi,et al.  Towards an open-source tool for measuring and visualizing the interest of technical debt , 2015, 2015 IEEE 7th International Workshop on Managing Technical Debt (MTD).

[30]  Fabio Palomba,et al.  Re-evaluating method-level bug prediction , 2018, SANER.

[31]  Uirá Kulesza,et al.  A Framework for Evaluating the Results of the SZZ Approach for Identifying Bug-Introducing Changes , 2017, IEEE Transactions on Software Engineering.

[32]  Gabriele Bavota,et al.  Code smells for Model-View-Controller architectures , 2017, Empirical Software Engineering.

[33]  Aiko Yamashita,et al.  Assessing the capability of code smells to explain maintenance problems: an empirical study combining quantitative and qualitative data , 2013, Empirical Software Engineering.

[34]  Yann-Gaël Guéhéneuc,et al.  Support vector machines for anti-pattern detection , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[35]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[36]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[37]  Zadia Codabux,et al.  Technical Debt Prioritization Using Predictive Analytics , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C).

[38]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[39]  Gabriele Bavota,et al.  Are test smells really harmful? An empirical study , 2014, Empirical Software Engineering.

[40]  Mauricio A. Saca Refactoring improving the design of existing code , 2017, 2017 IEEE 37th Central America and Panama Convention (CONCAPAN XXXVII).

[41]  Nakarin Maneerat,et al.  Bad-smell prediction from software design model using machine learning techniques , 2011, 2011 Eighth International Joint Conference on Computer Science and Software Engineering (JCSSE).

[42]  Francesca Arcelli Fontana,et al.  Towards a prioritization of code debt: A code smell Intensity Index , 2015, 2015 IEEE 7th International Workshop on Managing Technical Debt (MTD).

[43]  Christian Bird,et al.  Diversity in software engineering research , 2013, ESEC/FSE 2013.

[44]  Harald C. Gall,et al.  Context is king: The developer perspective on the usage of static analysis tools , 2018, 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[45]  Michael Pradel,et al.  How Many of All Bugs Do We Find? A Study of Static Bug Detectors , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).