Can We Predict the Quality of Spectrum-based Fault Localization?

Fault localization and repair are time-consuming and tedious. There is a significant and growing need for automated techniques to support such tasks. Despite significant progress in this area, existing fault localization techniques are not widely applied in practice yet and their effectiveness varies greatly from case to case. Existing work suggests new algorithms and ideas as well as adjustments to the test suites to improve the effectiveness of automated fault localization. However, important questions remain open: Why is the effectiveness of these techniques so unpredictable? What are the factors that influence the effectiveness of fault localization? Can we accurately predict fault localization effectiveness? In this paper, we try to answer these questions by collecting 70 static, dynamic, test suite, and fault-related metrics that we hypothesize are related to effectiveness. Our analysis shows that a combination of only a few static, dynamic, and test metrics enables the construction of a prediction model with excellent discrimination power between levels of effectiveness (eight metrics yielding an AUC of.86; fifteen metrics yielding an AUC of.88). The model hence yields a practically useful confidence factor that can be used to assess the potential effectiveness of fault localization. Given that the metrics are the most influential metrics explaining the effectiveness of fault localization, they can also be used as a guide for corrective actions on code and test suites leading to more effective fault localization.

[1]  Witold Pedrycz,et al.  A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[2]  Sashank Dara,et al.  Online Defect Prediction for Imbalanced Data , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[3]  David Lo,et al.  Should I follow this fault localization tool’s output? , 2014, Empirical Software Engineering.

[4]  R. Luce,et al.  A method of matrix analysis of group structure , 1949, Psychometrika.

[5]  M. L. Chaim,et al.  Contextualizing spectrum-based fault localization , 2018, Inf. Softw. Technol..

[6]  Alessandro Orso,et al.  Are automated debugging techniques actually helping programmers? , 2011, ISSTA '11.

[7]  Song Wang,et al.  Automatically Learning Semantic Features for Defect Prediction , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[8]  Sinno Jialin Pan,et al.  Transfer defect learning , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[9]  Arie van Deursen,et al.  A test-suite diagnosability metric for spectrum-based fault localization approaches , 2017, ICSE.

[10]  Xiao Liu,et al.  An empirical study on software defect prediction with a simplified metric set , 2014, Inf. Softw. Technol..

[11]  Alex Groce,et al.  Reduce Before You Localize: Delta-Debugging and Spectrum-Based Fault Localization , 2018, 2018 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW).

[12]  M. L. Chaim,et al.  Spectrum-based Software Fault Localization: A Survey of Techniques, Advances, and Challenges , 2016, ArXiv.

[13]  Xia Li,et al.  Boosting spectrum-based fault localization using PageRank , 2017, ISSTA.

[14]  Michael D. Ernst,et al.  Defects4J: a database of existing faults to enable controlled testing studies for Java programs , 2014, ISSTA 2014.

[15]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[16]  Jian Li,et al.  Software Defect Prediction via Convolutional Neural Network , 2017, 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS).

[17]  Shan Suthaharan,et al.  Modeling and Algorithms , 2016 .

[18]  David Lo,et al.  Practitioners' expectations on automated fault localization , 2016, ISSTA.

[19]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[20]  T. Zimmermann,et al.  Predicting Faults from Cached History , 2007, 29th International Conference on Software Engineering (ICSE'07).

[21]  K. Pearson VII. Note on regression and inheritance in the case of two parents , 1895, Proceedings of the Royal Society of London.

[22]  Arvinder Kaur,et al.  An empirical evaluation of classification algorithms for fault prediction in open source projects , 2018, J. King Saud Univ. Comput. Inf. Sci..

[23]  Mary Jean Harrold,et al.  Empirical evaluation of the tarantula automatic fault-localization technique , 2005, ASE.

[24]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[25]  Xiang Ji,et al.  A Test Suite Reduction Approach to Improving the Effectiveness of Fault Localization , 2017, 2017 International Conference on Software Analysis, Testing and Evolution (SATE).

[26]  Ayse Basar Bener,et al.  Defect prediction from static code features: current results, limitations, new approaches , 2010, Automated Software Engineering.

[27]  Wes Masri,et al.  Automated Fault Localization: Advances and Challenges , 2015, Adv. Comput..

[28]  Bart Baesens,et al.  Evaluating software defect prediction performance: an updated benchmarking study , 2019, SSRN Electronic Journal.

[29]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[30]  Jin Liu,et al.  Dictionary learning based software defect prediction , 2014, ICSE.

[31]  W. Eric Wong,et al.  Software Fault Localization Using DStar (D*) , 2012, 2012 IEEE Sixth International Conference on Software Security and Reliability.

[32]  Baowen Xu,et al.  On the analysis of spectrum based fault localization using hitting sets , 2019, J. Syst. Softw..

[33]  A.J.C. van Gemund,et al.  On the Accuracy of Spectrum-based Fault Localization , 2007, Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION (TAICPART-MUTATION 2007).

[34]  K. Punitha,et al.  Software defect prediction using software metrics - A survey , 2013, 2013 International Conference on Information Communication and Embedded Systems (ICICES).

[35]  Martin Fowler,et al.  Refactoring - Improving the Design of Existing Code , 1999, Addison Wesley object technology series.

[36]  Rui Abreu,et al.  A Survey on Software Fault Localization , 2016, IEEE Transactions on Software Engineering.

[37]  Michael D. Ernst,et al.  Evaluating and Improving Fault Localization , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[38]  Marcelo de Almeida Maia,et al.  Dissection of a bug dataset: Anatomy of 395 patches from Defects4J , 2018, 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[39]  Lingfeng Bao,et al.  “Automated Debugging Considered Harmful” Considered Harmful: A User Study Revisiting the Usefulness of Spectra-Based Fault Localization Techniques with Professionals Using Real Bugs from Large Systems , 2016, 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[40]  Yong Li,et al.  Spectrum-Based Fault Localization via Enlarging Non-Fault Region to Improve Fault Absolute Ranking , 2018, IEEE Access.

[41]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[42]  David Lo,et al.  Will Fault Localization Work for These Failures? An Automated Approach to Predict Effectiveness of Fault Localization Tools , 2013, 2013 IEEE International Conference on Software Maintenance.