Diversity maximization speedup for localizing faults in single-fault and multi-fault programs

Fault localization is useful for reducing debugging effort. Such techniques require test cases with oracles, which can determine whether a program behaves correctly for every test input. Although most fault localization techniques can localize faults relatively accurately even with a small number of test cases, choosing the right test cases and creating oracles for them are not easy. Test oracle creation is expensive because it can take much manual labeling effort (i.e., effort needed to decide whether the test cases pass or fail). Given a number of test cases to be executed, it is challenging to minimize the number of test cases requiring manual labeling and in the meantime achieve good fault localization accuracy. To address this challenge, this paper presents a novel test case selection strategy based on Diversity Maximization Speedup (Dms). Dms orders a set of unlabeled test cases in a way that maximizes the effectiveness of a fault localization technique. Developers are only expected to label a much smaller number of test cases along this ordering to achieve good fault localization results. We evaluate the performance of Dms on 2 different types of programs, single-fault and multi-fault programs. Our experiments with 411 faults from the Software-artifact Infrastructure Repository show (1) that Dms can help existing fault localization techniques to achieve comparable accuracy with on average 67 and 6 % fewer labeled test cases than previously best test case prioritization techniques for single-fault and multi-fault programs, and (2) that given a labeling budget (i.e., a fixed number of labeled test cases), Dms can help existing fault localization techniques reduce their debugging cost (in terms of the amount of code needed to be inspected to locate faults). We conduct hypothesis test and show that the saving of the debugging cost we achieve for the real C programs are statistically significant.

[1]  ZellerAndreas Isolating cause-effect chains from computer programs , 2002 .

[2]  Andy Podgurski,et al.  Mitigating the confounding effects of program dependences for effective fault localization , 2011, ESEC/FSE '11.

[3]  Rui Abreu,et al.  Prioritizing Tests for Fault Localization , 2013, Situation Awareness with Systems of Systems.

[4]  A. Zeller Isolating cause-effect chains from computer programs , 2002, SIGSOFT '02/FSE-10.

[5]  Lee Naish,et al.  A model for spectra-based software diagnosis , 2011, TSEM.

[6]  Alessandro Orso,et al.  Are automated debugging techniques actually helping programmers? , 2011, ISSTA '11.

[7]  Mark Harman,et al.  Search Algorithms for Regression Test Case Prioritization , 2007, IEEE Transactions on Software Engineering.

[8]  Andy Podgurski,et al.  Causal inference for statistical fault localization , 2010, ISSTA '10.

[9]  Richard G. Hamlet,et al.  Testing Programs with the Aid of a Compiler , 1977, IEEE Transactions on Software Engineering.

[10]  John T. Stasko,et al.  Visualization of test information to assist fault localization , 2002, ICSE '02.

[11]  Gregg Rothermel,et al.  Supporting Controlled Experimentation with Testing Techniques: An Infrastructure and its Potential Impact , 2005, Empirical Software Engineering.

[12]  Raúl A. Santelices,et al.  Lightweight fault-localization using multiple coverage types , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[13]  Tao Xie,et al.  Augmenting Automatically Generated Unit-Test Suites with Regression Oracle Checking , 2006, ECOOP.

[14]  Chao Liu,et al.  SOBER: statistical model-based bug localization , 2005, ESEC/FSE-13.

[15]  James H. Andrews,et al.  Evaluating the Accuracy of Fault Localization Techniques , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[16]  T. H. Tse,et al.  Adaptive Random Test Case Prioritization , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[17]  Gregg Rothermel,et al.  Prioritizing test cases for regression testing , 2000, ISSTA '00.

[18]  Peter Zoeteweij,et al.  A practical evaluation of spectrum-based fault localization , 2009, J. Syst. Softw..

[19]  Michael I. Jordan,et al.  Scalable statistical bug isolation , 2005, PLDI '05.

[20]  Gregg Rothermel,et al.  Test Case Prioritization: A Family of Empirical Studies , 2002, IEEE Trans. Software Eng..

[21]  Zhendong Su,et al.  Context-aware statistical debugging: from bug predictors to faulty control flow paths , 2007, ASE.

[22]  Gregg Rothermel,et al.  Bridging the gap between the total and additional test-case prioritization strategies , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[23]  Yves Le Traon,et al.  Improving test suites for efficient fault localization , 2006, ICSE.

[24]  Marcelo d'Amorim,et al.  Entropy-based test generation for improved fault localization , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[25]  Koushik Sen,et al.  CUTE: a concolic unit testing engine for C , 2005, ESEC/FSE-13.

[26]  Mary Jean Harrold,et al.  Empirical evaluation of the tarantula automatic fault-localization technique , 2005, ASE.

[27]  Boris Beizer,et al.  Software Testing Techniques , 1983 .

[28]  Bruce E. Barrett Regression Analysis: Concepts and Applications , 1994 .

[29]  Lionel C. Briand,et al.  Adaptive random testing: an illusion of effectiveness? , 2011, ISSTA '11.

[30]  Rui Abreu,et al.  Prioritizing tests for software fault diagnosis , 2011, Softw. Pract. Exp..

[31]  Rui Abreu,et al.  Prioritizing tests for fault localization through ambiguity group reduction , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[32]  Boris Beizer,et al.  Software testing techniques (2. ed.) , 1990 .

[33]  Michael D. Ernst,et al.  Eclat: Automatic Generation and Classification of Test Inputs , 2005, ECOOP.

[34]  David Lo,et al.  Extended comprehensive study of association measures for fault localization , 2014, J. Softw. Evol. Process..

[35]  Friedrich Steimann,et al.  Improving Coverage-Based Localization of Multiple Faults Using Algorithms from Integer Linear Programming , 2012, 2012 IEEE 23rd International Symposium on Software Reliability Engineering.

[36]  Frank Tip,et al.  Directed test generation for effective fault localization , 2010, ISSTA '10.

[37]  W. Eric Wong,et al.  A consensus‐based strategy to improve the quality of fault localization , 2013, Softw. Pract. Exp..

[38]  Koushik Sen,et al.  DART: directed automated random testing , 2005, PLDI '05.

[39]  H. Cleve,et al.  Locating causes of program failures , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[40]  W. Eric Wong,et al.  The DStar Method for Effective Software Fault Localization , 2014, IEEE Transactions on Reliability.

[41]  Rui Abreu,et al.  Threats to the validity and value of empirical assessments of the accuracy of coverage-based fault locators , 2013, ISSTA.

[42]  Michael I. Jordan,et al.  Bug isolation via remote program sampling , 2003, PLDI.

[43]  Franklin A. Graybill,et al.  Regression Analysis-Concepts and Applications , 1995 .

[44]  Steven P. Reiss,et al.  Fault localization with nearest neighbor queries , 2003, 18th IEEE International Conference on Automated Software Engineering, 2003. Proceedings..

[45]  Ting Chen,et al.  Statistical debugging using compound boolean predicates , 2007, ISSTA '07.

[46]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[47]  T. H. Tse,et al.  On Practical Adequate Test Suites for Integrated Test Case Prioritization and Fault Localization , 2011, 2011 11th International Conference on Quality Software.

[48]  Baowen Xu,et al.  A theoretical analysis of the risk evaluation formulas for spectrum-based fault localization , 2013, TSEM.

[49]  HarroldMary Jean,et al.  Active learning for automatic classification of software behavior , 2004 .

[50]  Xiaofeng Xu,et al.  Ties within Fault Localization rankings: Exposing and Addressing the Problem , 2011, Int. J. Softw. Eng. Knowl. Eng..