Fault Prediction based on Software Metrics and SonarQube Rules. Machine or Deep Learning?

Background. Developers spend more time fixing bugs and refactoring the code to increase the maintainability than developing new features. Researchers investigated the code quality impact on fault-proneness focusing on code smells and code metrics. Objective. We aim at advancing fault-inducing commit prediction based on SonarQube considering the contribution provided by each rule and metric. Method. We designed and conducted a case study among 33 Java projects analyzed with SonarQube and SZZ to identify fault-inducing and fault-fixing commits. Moreover, we investigated fault-proneness of each SonarQube rule and metric using Machine and Deep Learning models. Results. We analyzed 77,932 commits that contain 40,890 faults and infected by more than 174 SonarQube rules violated 1,9M times, on which there was calculated 24 software metrics available by the tool. Compared to machine learning models, deep learning provide a more accurate fault detection accuracy and allowed us to accurately identify the fault-prediction power of each SonarQube rule. As a result, fourteen of the 174 violated rules has an importance higher than 1% and account for 30% of the total fault-proneness importance, while the fault proneness of the remaining 165 rules is negligible. Conclusion. Future works might consider the adoption of timeseries analysis and anomaly detection techniques to better and more accurately detect the rules that impact fault-proneness.

[1]  Michele Lanza,et al.  An extensive comparison of bug prediction approaches , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[2]  Maria Teresa Baldassarre,et al.  On the Accuracy of SonarQube Technical Debt Remediation Time , 2019, 2019 45th Euromicro Conference on Software Engineering and Advanced Applications (SEAA).

[3]  Jens Grabowski,et al.  Static source code metrics and static analysis warnings for fine-grained just-in-time defect prediction , 2020, 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[4]  Foutse Khomh,et al.  BDTEX: A GQM-based Bayesian approach for the detection of antipatterns , 2011, J. Syst. Softw..

[5]  Richard C. Holt,et al.  The top ten list: dynamic fault prediction , 2005, 21st IEEE International Conference on Software Maintenance (ICSM'05).

[6]  Francesca Arcelli Fontana,et al.  Change Prediction through Coding Rules Violations , 2017, EASE.

[7]  Antonio Martini,et al.  Towards surgically-precise technical debt estimation: early results and research roadmap , 2019, MaLTeSQuE@ESEC/SIGSOFT FSE.

[8]  Andreas Zeller,et al.  Mining metrics to predict component failures , 2006, ICSE.

[9]  Andreas Zeller,et al.  How Failures Come to Be , 2009 .

[10]  Geoffrey E. Hinton,et al.  Application of Deep Belief Networks for Natural Language Understanding , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[11]  Shane McIntosh,et al.  Are Fix-Inducing Changes a Moving Target? A Longitudinal Case Study of Just-In-Time Defect Prediction , 2018, IEEE Transactions on Software Engineering.

[12]  Jane Cleland-Huang,et al.  Semantically Enhanced Software Traceability Using Deep Learning Techniques , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[13]  David Lo,et al.  DeepJIT: An End-to-End Deep Learning Framework for Just-in-Time Defect Prediction , 2019, 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR).

[14]  Foutse Khomh,et al.  SQUAD: Software Quality Understanding through the Analysis of Design , 2009, 2009 16th Working Conference on Reverse Engineering.

[15]  Andreas Zeller,et al.  When do changes induce fixes? , 2005, ACM SIGSOFT Softw. Eng. Notes.

[16]  Witold Pedrycz,et al.  A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[17]  D. Taibi,et al.  Some SonarQube Issues have a Significant but SmallEffect on Faults and Changes. A large-scale empirical study , 2019, J. Syst. Softw..

[18]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  M. Patton,et al.  Qualitative evaluation and research methods , 1992 .

[20]  Andrea De Lucia,et al.  Detecting code smells using machine learning techniques: Are we there yet? , 2018, 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[21]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[22]  Tibor Gyimóthy,et al.  Deep learning in static, metric-based bug prediction , 2020, Array.

[23]  Peter C. Rigby,et al.  WarningsGuru: integrating statistical bug models with static analysis to provide timely and specific bug warnings , 2018, ESEC/SIGSOFT FSE.

[24]  V MäntyläMika,et al.  Comparing and experimenting machine learning techniques for code smell detection , 2016 .

[25]  Gabriele Bavota,et al.  On the diffuseness and the impact on maintainability of code smells: a large scale empirical investigation , 2018, Empirical Software Engineering.

[26]  Martin White,et al.  Toward Deep Learning Software Repositories , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[27]  Ramanath Subramanyam,et al.  Empirical Analysis of CK Metrics for Object-Oriented Design Complexity: Implications for Software Defects , 2003, IEEE Trans. Software Eng..

[28]  Xiaodong Gu,et al.  Deep API learning , 2016, SIGSOFT FSE.

[29]  Mauricio A. Saca Refactoring improving the design of existing code , 2017, 2017 IEEE 37th Central America and Panama Convention (CONCAPAN XXXVII).

[30]  Alberto Sillitti,et al.  Analyzing Forty Years of Software Maintenance Models , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C).

[31]  Sandro Morasca,et al.  Slope-based fault-proneness thresholds for software engineering measures , 2016, EASE.

[32]  Elaine J. Weyuker,et al.  Predicting the location and number of faults in large software systems , 2005, IEEE Transactions on Software Engineering.

[33]  Ahmed E. Hassan,et al.  Predicting faults using the complexity of code changes , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[34]  Davide Taibi,et al.  OpenSZZ: A Free, Open-Source, Web-Accessible Implementation of the SZZ Algorithm , 2020, 2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC).

[35]  Alberto Sillitti,et al.  A Survey on Code Analysis Tools for Software Maintenance Prediction , 2018, SEDA.

[36]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[37]  Xiaodong Gu,et al.  Deep Code Search , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[38]  Sashank Dara,et al.  Online Defect Prediction for Imbalanced Data , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[39]  Davide Taibi,et al.  The Technical Debt Dataset , 2019, PROMISE.

[40]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[41]  Francesca Arcelli Fontana,et al.  Code smell severity classification using machine learning techniques , 2017, Knowl. Based Syst..

[42]  Anh Tuan Nguyen,et al.  Combining Deep Learning with Information Retrieval to Localize Buggy Files for Bug Reports (N) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[43]  Fabio Palomba,et al.  Re-evaluating method-level bug prediction , 2018, SANER.

[44]  Harald C. Gall,et al.  Context is king: The developer perspective on the usage of static analysis tools , 2018, 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[45]  Fabio Palomba,et al.  On the adequacy of static analysis warnings with respect to code smell prediction , 2020, Empirical Software Engineering.

[46]  Xinli Yang,et al.  Deep Learning for Just-in-Time Defect Prediction , 2015, 2015 IEEE International Conference on Software Quality, Reliability and Security.

[47]  Fabio Palomba,et al.  Fine-grained just-in-time defect prediction , 2019, J. Syst. Softw..

[48]  Terese Besker,et al.  An Overview and Comparison of Technical Debt Measurement Tools , 2021, IEEE Software.

[49]  Foutse Khomh,et al.  An empirical study of code smells in JavaScript projects , 2017, 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[50]  Andrea De Lucia,et al.  A large empirical assessment of the role of data balancing in machine-learning-based code smell detection , 2020, J. Syst. Softw..

[51]  Christine Nadel,et al.  Case Study Research Design And Methods , 2016 .

[52]  Audris Mockus,et al.  A large-scale empirical study of just-in-time quality assurance , 2013, IEEE Transactions on Software Engineering.

[53]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[54]  Martin Fowler,et al.  Refactoring - Improving the Design of Existing Code , 1999, Addison Wesley object technology series.

[55]  Heikki Huttunen,et al.  Are SonarQube Rules Inducing Bugs? , 2019, 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[56]  Mika Mäntylä,et al.  Comparing and experimenting machine learning techniques for code smell detection , 2015, Empirical Software Engineering.

[57]  Osamu Mizuno,et al.  The impact of context metrics on just-in-time defect prediction , 2019, Empirical Software Engineering.

[58]  Harald C. Gall,et al.  How developers engage with static analysis tools in different contexts , 2019, Empirical Software Engineering.

[59]  Sunghun Kim,et al.  Toward an understanding of bug fix patterns , 2009, Empirical Software Engineering.

[60]  Mohamed S. Abougabal,et al.  Software Bug Prediction Employing Feature Selection and Deep Learning , 2020, 2019 International Conference on Advances in the Emerging Computing Technologies (AECT).

[61]  N. Nagappan,et al.  Use of relative code churn measures to predict system defect density , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[62]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[63]  Haidar Osman,et al.  An Extensive Analysis of Efficient Bug Prediction Configurations , 2017, PROMISE.

[64]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[65]  Hoa Khanh Dam,et al.  Automatic Feature Learning for Predicting Vulnerable Software Components , 2021, IEEE Transactions on Software Engineering.

[66]  Davide Falessi,et al.  What if I Had No Smells? , 2017, 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).

[67]  Per Runeson,et al.  Guidelines for conducting and reporting case study research in software engineering , 2009, Empirical Software Engineering.

[68]  Diomidis Spinellis,et al.  On the Dichotomy of Debugging Behavior Among Programmers , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[69]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[70]  William White,et al.  A Proposal , 2008, Moon, Sun, and Witches.

[71]  Davide Spadini,et al.  PyDriller: Python framework for mining software repositories , 2018, ESEC/SIGSOFT FSE.

[72]  Foutse Khomh,et al.  Developer-Driven Code Smell Prioritization , 2020, 2020 IEEE/ACM 17th International Conference on Mining Software Repositories (MSR).

[73]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[74]  Nakarin Maneerat,et al.  Bad-smell prediction from software design model using machine learning techniques , 2011, 2011 Eighth International Joint Conference on Computer Science and Software Engineering (JCSSE).

[75]  Andreas Zeller,et al.  Predicting faults from cached history , 2008, ISEC '08.

[76]  Germain Forestier,et al.  Deep learning for time series classification: a review , 2018, Data Mining and Knowledge Discovery.

[77]  Maria Teresa Baldassarre,et al.  On the diffuseness of technical debt items and accuracy of remediation time when using SonarQube , 2020, Inf. Softw. Technol..

[78]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[79]  Davide Taibi,et al.  Does Code Quality Affect Pull Request Acceptance? An empirical study , 2019, J. Syst. Softw..

[80]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[81]  Steve Counsell,et al.  The effect of refactoring on change and fault-proneness in commercial C# software , 2015, Sci. Comput. Program..

[82]  Jian Li,et al.  Software Defect Prediction via Convolutional Neural Network , 2017, 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS).

[83]  Danny Dig,et al.  Accurate and Efficient Refactoring Detection in Commit History , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[84]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[85]  Jaechang Nam,et al.  Deep Semantic Feature Learning for Software Defect Prediction , 2020, IEEE Transactions on Software Engineering.

[86]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[87]  Jeffrey C. Carver Towards Reporting Guidelines for Experimental Replications: A Proposal , 2010 .

[88]  Emerson R. Murphy-Hill,et al.  The Design Space of Bug Fixes and How Developers Navigate It , 2015, IEEE Transactions on Software Engineering.

[89]  Tim Oates,et al.  Time series classification from scratch with deep neural networks: A strong baseline , 2016, 2017 International Joint Conference on Neural Networks (IJCNN).