Using automated program repair for evaluating the effectiveness of fault localization techniques

Many techniques on automated fault localization (AFL) have been introduced to assist developers in debugging. Prior studies evaluate the localization technique from the viewpoint of developers: measuring how many benefits that developers can obtain from the localization technique used when debugging. However, these evaluation approaches are not always suitable, because it is difficult to quantify precisely the benefits due to the complex debugging behaviors of developers. In addition, recent user studies have presented that developers working with AFL do not correct the defects more efficiently than ones working with only traditional debugging techniques such as breakpoints, even when the effectiveness of AFL is artificially improved. In this paper we attempt to propose a new research direction of developing AFL techniques from the viewpoint of fully automated debugging including the program repair of automation, for which the activity of AFL is necessary. We also introduce the NCP score as the evaluation measurement to assess and compare various techniques from this perspective. Our experiment on 15 popular AFL techniques with 11 subject programs shipping with real-life field failures presents the evidence that these AFL techniques performing well in prior studies do not have better localization effectiveness according to NCP score. We also observe that Jaccard has the better performance over other techniques in our experiment.

[1]  Claire Le Goues,et al.  A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[2]  Shan Lu,et al.  Automated atomicity-violation fixing , 2011, PLDI '11.

[3]  Lionel C. Briand,et al.  A practical guide for using statistical tests to assess randomized algorithms in software engineering , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[4]  Mary Jean Harrold,et al.  An empirical study of the effects of test-suite reduction on fault localization , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[5]  Claire Le Goues,et al.  Measuring Code Quality to Improve Specification Mining , 2012, IEEE Transactions on Software Engineering.

[6]  Frank Tip,et al.  Directed test generation for effective fault localization , 2010, ISSTA '10.

[7]  John A. Clark,et al.  Efficient Software Verification: Statistical Testing Using Automated Search , 2010, IEEE Transactions on Software Engineering.

[8]  Baowen Xu,et al.  A theoretical analysis of the risk evaluation formulas for spectrum-based fault localization , 2013, TSEM.

[9]  Michael I. Jordan,et al.  Scalable statistical bug isolation , 2005, PLDI '05.

[10]  Martin Burger,et al.  Minimizing reproduction of software failures , 2011, ISSTA '11.

[11]  Mary Jean Harrold,et al.  Empirical evaluation of the tarantula automatic fault-localization technique , 2005, ASE.

[12]  Alessandro Orso,et al.  Are automated debugging techniques actually helping programmers? , 2011, ISSTA '11.

[13]  Premkumar T. Devanbu,et al.  To what extent could we detect field defects? an empirical study of false negatives in static bug finding tools , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[14]  Bertrand Meyer,et al.  Code-based automated program fixing , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[15]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[16]  Bertrand Meyer,et al.  Inferring better contracts , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[17]  Yuhua Qi,et al.  Making automatic repair for large-scale programs more efficient using weak recompilation , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[18]  A.J.C. van Gemund,et al.  On the Accuracy of Spectrum-based Fault Localization , 2007, Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION (TAICPART-MUTATION 2007).

[19]  Andreas Zeller Automated Debugging: Are We Close , 2001, Computer.

[20]  Peter Zoeteweij,et al.  A practical evaluation of spectrum-based fault localization , 2009, J. Syst. Softw..

[21]  Claire Le Goues,et al.  GenProg: A Generic Method for Automatic Software Repair , 2012, IEEE Transactions on Software Engineering.

[22]  Alessandro Orso,et al.  Isolating failure causes through test case generation , 2012, ISSTA 2012.

[23]  John T. Stasko,et al.  Visualization of test information to assist fault localization , 2002, ICSE '02.

[24]  A. Vargha,et al.  A Critique and Improvement of the CL Common Language Effect Size Statistics of McGraw and Wong , 2000 .

[25]  Frank Tip,et al.  Automated repair of HTML generation errors in PHP applications using string constraint solving , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[26]  Andrea Arcuri,et al.  On the automation of fixing software bugs , 2008, ICSE Companion '08.

[27]  Eric A. Brewer,et al.  Pinpoint: problem determination in large, dynamic Internet services , 2002, Proceedings International Conference on Dependable Systems and Networks.

[28]  James H. Andrews,et al.  Evaluating the Accuracy of Fault Localization Techniques , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[29]  Lee Naish,et al.  A model for spectra-based software diagnosis , 2011, TSEM.

[30]  Andrea Arcuri,et al.  Evolutionary repair of faulty software , 2011, Appl. Soft Comput..

[31]  Alessandro Orso,et al.  BugRedux: Reproducing field failures for in-house debugging , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[32]  Westley Weimer,et al.  A human study of patch maintainability , 2012, ISSTA 2012.

[33]  Andreas Zeller,et al.  Simplifying and Isolating Failure-Inducing Input , 2002, IEEE Trans. Software Eng..