Evaluating & improving fault localization techniques

A fault localization technique takes as input a faulty program, and it produces as output a ranked list of suspicious code locations at which the program may be defective. When researchers propose a new fault localization technique, they evaluate it on programs with known faults; they score the technique based on where in its output list the defective code appears. This enables comparison of multiple fault localization techniques to determine which one is better. Previous research has evaluated fault localization techniques using artificial faults, generated either by mutation tools or manually. In other words, previous research has determined which fault localization techniques are best at finding artificial faults. However, it is not known which fault localization techniques are best at finding real faults. It is not obvious that the answer is the same, given previous work showing that artificial faults have both similarities to and differences from real faults. We performed a replication study to evaluate 10 claims in the literature that compared fault localization techniques. We used 2273 artificial faults in 5 real-world programs. Our results refute 3 of the previous claims. Then, we evaluated the same 10 claims, using 297 real faults from the 5 programs. Every previous result was refuted or was statistically insignificant. In other words, our experiments show that artificial faults are not useful for predicting which fault localization techniques perform best on real faults. In light of these results, we identified a design space that includes many previously-studied fault localization techniques as well as hundreds of new techniques. We experimentally determined which factors in the design space are most important. Then, we extended it with new techniques. Several of our novel techniques outperform all existing techniques, notably in terms of ranking defective code in the top-5 or top-10 reports.

[1]  David Lo,et al.  Should I follow this fault localization tool’s output? , 2014, Empirical Software Engineering.

[2]  James A. Jones,et al.  Fault density, fault types, and spectra-based fault localization , 2015, Empirical Software Engineering.

[3]  Lee Naish,et al.  A model for spectra-based software diagnosis , 2011, TSEM.

[4]  René Just,et al.  MAJOR: An efficient and extensible tool for mutation analysis in a Java compiler , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[5]  Mark Harman,et al.  An Analysis and Survey of the Development of Mutation Testing , 2011, IEEE Transactions on Software Engineering.

[6]  Andreas Zeller,et al.  Simplifying and Isolating Failure-Inducing Input , 2002, IEEE Trans. Software Eng..

[7]  Markus Stumptner,et al.  Modeling Programs with Unstructured Control Flow for Debugging , 2002, Australian Joint Conference on Artificial Intelligence.

[8]  Markus Stumptner,et al.  Evaluating Models for Model-Based Debugging , 2008, 2008 23rd IEEE/ACM International Conference on Automated Software Engineering.

[9]  Lei Zhao,et al.  A Crosstab-based Statistical Method for Effective Fault Localization , 2008, 2008 1st International Conference on Software Testing, Verification, and Validation.

[10]  Michael D. Ernst,et al.  Are mutants a valid substitute for real faults in software testing? , 2014, SIGSOFT FSE.

[11]  Phyllis G. Frankl,et al.  Empirical evaluation of the textual differencing regression testing technique , 1998, Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272).

[12]  Pascale Thévenod-Fosse,et al.  Software error analysis: a real case study involving real faults and mutations , 1996, ISSTA '96.

[13]  René Just,et al.  Higher accuracy and lower run time: efficient mutation analysis using non‐redundant mutation operators , 2015, Softw. Test. Verification Reliab..

[14]  David Lo,et al.  Theory and Practice, Do They Match? A Case with Spectrum-Based Fault Localization , 2013, 2013 IEEE International Conference on Software Maintenance.

[15]  Markus Stumptner,et al.  Model-Based Debugging or How to Diagnose Programs Automatically , 2002, IEA/AIE.

[16]  W. Eric Wong,et al.  The DStar Method for Effective Software Fault Localization , 2014, IEEE Transactions on Reliability.

[17]  Baowen Xu,et al.  A brief survey of program slicing , 2005, SOEN.

[18]  Alessandro Orso,et al.  Are automated debugging techniques actually helping programmers? , 2011, ISSTA '11.

[19]  Rui Abreu,et al.  Refining spectrum-based fault localization rankings , 2009, SAC '09.

[20]  Rui Abreu,et al.  A Survey on Software Fault Localization , 2016, IEEE Transactions on Software Engineering.

[21]  Shujuan Jiang,et al.  HSFal: Effective fault localization using hybrid spectrum of full slices and execution slices , 2014, J. Syst. Softw..

[22]  René Just,et al.  Using Non-redundant Mutation Operators and Test Suite Prioritization to Achieve Efficient and Scalable Mutation Analysis , 2012, 2012 IEEE 23rd International Symposium on Software Reliability Engineering.

[23]  Byoungju Choi,et al.  A family of code coverage-based heuristics for effective fault localization , 2010, J. Syst. Softw..

[24]  James H. Andrews,et al.  Evaluating the Accuracy of Fault Localization Techniques , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[25]  Chao Liu,et al.  Statistical Debugging: A Hypothesis Testing-Based Approach , 2006, IEEE Transactions on Software Engineering.

[26]  Shin Yoo,et al.  Ask the Mutants: Mutating Faulty Programs for Fault Localization , 2014, 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation.

[27]  Yves Le Traon,et al.  Metallaxis‐FL: mutation‐based fault localization , 2015, Softw. Test. Verification Reliab..

[28]  Rui Abreu,et al.  Using HTML5 visualizations in software fault localization , 2013, 2013 First IEEE Working Conference on Software Visualization (VISSOFT).

[29]  Rui Abreu,et al.  Threats to the validity and value of empirical assessments of the accuracy of coverage-based fault locators , 2013, ISSTA.

[30]  Mark David Weiser,et al.  Program slices: formal, psychological, and practical investigations of an automatic program abstraction method , 1979 .

[31]  Fan Long,et al.  An Analysis of the Search Spaces for Generate and Validate Patch Generation Systems , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[32]  Rui Abreu,et al.  A Low-Cost Approximate Minimal Hitting Set Algorithm and its Application to Model-Based Diagnosis , 2009, SARA.

[33]  Lionel C. Briand,et al.  Using Mutation Analysis for Assessing and Comparing Testing Coverage Criteria , 2006, IEEE Transactions on Software Engineering.

[34]  David Lo,et al.  Practitioners' expectations on automated fault localization , 2016, ISSTA.

[35]  Raúl A. Santelices,et al.  Lightweight fault-localization using multiple coverage types , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[36]  Alex Groce,et al.  Mutations: How Close are they to Real Faults? , 2014, 2014 IEEE 25th International Symposium on Software Reliability Engineering.

[37]  Peter Zoeteweij,et al.  Spectrum-Based Multiple Fault Localization , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[38]  Yuhua Qi,et al.  Using automated program repair for evaluating the effectiveness of fault localization techniques , 2013, ISSTA.

[39]  René Just,et al.  The major mutation framework: efficient and scalable mutation analysis for Java , 2014, ISSTA 2014.

[40]  Shriram Krishnamurthi,et al.  Automated Fault Localization Using Potential Invariants , 2003, ArXiv.

[41]  Baowen Xu,et al.  A theoretical analysis of the risk evaluation formulas for spectrum-based fault localization , 2013, TSEM.

[42]  Martin Monperrus,et al.  Test case purification for improving fault localization , 2014, SIGSOFT FSE.

[43]  Rui Abreu,et al.  GZoltar: an eclipse plug-in for testing and debugging , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[44]  Peter Zoeteweij,et al.  A practical evaluation of spectrum-based fault localization , 2009, J. Syst. Softw..

[45]  Joseph Robert Horgan,et al.  Effect of Test Set Minimization on Fault Detection Effectiveness , 1995, 1995 17th International Conference on Software Engineering.

[46]  Erica Mealy,et al.  BegBunch: benchmarking for C bug detection tools , 2009, DEFECTS '09.

[47]  Lionel C. Briand,et al.  Is mutation an appropriate tool for testing experiments? , 2005, ICSE.

[48]  Thomas J. Ostrand,et al.  Experiments on the effectiveness of dataflow- and control-flow-based test adequacy criteria , 1994, Proceedings of 16th International Conference on Software Engineering.

[49]  Andreas Zeller,et al.  Covering and Uncovering Equivalent Mutants , 2013, Softw. Test. Verification Reliab..

[50]  A.J.C. van Gemund,et al.  On the Accuracy of Spectrum-based Fault Localization , 2007, Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION (TAICPART-MUTATION 2007).

[51]  Mary Jean Harrold,et al.  Empirical evaluation of the tarantula automatic fault-localization technique , 2005, ASE.

[52]  Michael D. Ernst,et al.  Defects4J: a database of existing faults to enable controlled testing studies for Java programs , 2014, ISSTA 2014.

[53]  Steven P. Reiss,et al.  Fault localization with nearest neighbor queries , 2003, 18th IEEE International Conference on Automated Software Engineering, 2003. Proceedings..

[54]  Peter Zoeteweij,et al.  Diagnosis of Embedded Software Using Program Spectra , 2007, 14th Annual IEEE International Conference and Workshops on the Engineering of Computer-Based Systems (ECBS'07).

[55]  Gregg Rothermel,et al.  On the Use of Mutation Faults in Empirical Assessments of Test Case Prioritization Techniques , 2006, IEEE Transactions on Software Engineering.

[56]  Akbar Siami Namin,et al.  The use of mutation in testing experiments and its sensitivity to external threats , 2011, ISSTA '11.