Diversity maximization speedup for fault localization

Fault localization is useful for reducing debugging effort. However, many fault localization techniques require non-trivial number of test cases with oracles, which can determine whether a program behaves correctly for every test input. Test oracle creation is expensive because it can take much manual labeling effort. Given a number of test cases to be executed, it is challenging to minimize the number of test cases requiring manual labeling and in the meantime achieve good fault localization accuracy. To address this challenge, this paper presents a novel test case selection strategy based on Diversity Maximization Speedup (DMS). DMS orders a set of unlabeled test cases in a way that maximizes the effectiveness of a fault localization technique. Developers are only expected to label a much smaller number of test cases along this ordering to achieve good fault localization results. Our experiments with more than 250 bugs from the Software-artifact Infrastructure Repository show (1) that DMS can help existing fault localization techniques to achieve comparable accuracy with on average 67% fewer labeled test cases than previously best test case prioritization techniques, and (2) that given a labeling budget (i.e., a fixed number of labeled test cases), DMS can help existing fault localization techniques reduce their debugging cost (in terms of the amount of code needed to be inspected to locate faults). We conduct hypothesis test and show that the saving of the debugging cost we achieve for the real C programs are statistically significant.

[1]  Gregg Rothermel,et al.  Test Case Prioritization: A Family of Empirical Studies , 2002, IEEE Trans. Software Eng..

[2]  Lionel C. Briand,et al.  Adaptive random testing: an illusion of effectiveness? , 2011, ISSTA '11.

[3]  Rui Abreu,et al.  Prioritizing tests for software fault diagnosis , 2011, Softw. Pract. Exp..

[4]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[5]  T. H. Tse,et al.  On Practical Adequate Test Suites for Integrated Test Case Prioritization and Fault Localization , 2011, 2011 11th International Conference on Quality Software.

[6]  Franklin A. Graybill,et al.  Regression Analysis-Concepts and Applications , 1995 .

[7]  Tao Xie,et al.  Augmenting Automatically Generated Unit-Test Suites with Regression Oracle Checking , 2006, ECOOP.

[8]  Michael I. Jordan,et al.  Bug isolation via remote program sampling , 2003, PLDI.

[9]  Bruce E. Barrett Regression Analysis: Concepts and Applications , 1994 .

[10]  Koushik Sen,et al.  CUTE: a concolic unit testing engine for C , 2005, ESEC/FSE-13.

[11]  John T. Stasko,et al.  Visualization of test information to assist fault localization , 2002, ICSE '02.

[12]  Gregg Rothermel,et al.  Prioritizing test cases for regression testing , 2000, ISSTA '00.

[13]  Rui Abreu,et al.  Prioritizing tests for fault localization through ambiguity group reduction , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[14]  Gregg Rothermel,et al.  Supporting Controlled Experimentation with Testing Techniques: An Infrastructure and its Potential Impact , 2005, Empirical Software Engineering.

[15]  Andy Podgurski,et al.  Mitigating the confounding effects of program dependences for effective fault localization , 2011, ESEC/FSE '11.

[16]  Raúl A. Santelices,et al.  Lightweight fault-localization using multiple coverage types , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[17]  Steven P. Reiss,et al.  Fault localization with nearest neighbor queries , 2003, 18th IEEE International Conference on Automated Software Engineering, 2003. Proceedings..

[18]  Peter Zoeteweij,et al.  A practical evaluation of spectrum-based fault localization , 2009, J. Syst. Softw..

[19]  Ting Chen,et al.  Statistical debugging using compound boolean predicates , 2007, ISSTA '07.

[20]  Sigrid Eldh Software Testing Techniques , 2007 .

[21]  Mark Harman,et al.  Search Algorithms for Regression Test Case Prioritization , 2007, IEEE Transactions on Software Engineering.

[22]  Andy Podgurski,et al.  Causal inference for statistical fault localization , 2010, ISSTA '10.

[23]  Yves Le Traon,et al.  Improving test suites for efficient fault localization , 2006, ICSE.

[24]  尚弘 島影 National Institute of Standards and Technologyにおける超伝導研究及び生活 , 2001 .

[25]  Richard G. Hamlet,et al.  Testing Programs with the Aid of a Compiler , 1977, IEEE Transactions on Software Engineering.

[26]  T. H. Tse,et al.  Adaptive Random Test Case Prioritization , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[27]  James M. Rehg,et al.  Active learning for automatic classification of software behavior , 2004, ISSTA '04.

[28]  Mary Jean Harrold,et al.  Empirical evaluation of the tarantula automatic fault-localization technique , 2005, ASE.

[29]  Frank Tip,et al.  Directed test generation for effective fault localization , 2010, ISSTA '10.

[30]  Koushik Sen,et al.  DART: directed automated random testing , 2005, PLDI '05.

[31]  Alessandro Orso,et al.  Are automated debugging techniques actually helping programmers? , 2011, ISSTA '11.

[32]  Michael D. Ernst,et al.  Eclat: Automatic Generation and Classification of Test Inputs , 2005, ECOOP.

[33]  A. Zeller Isolating cause-effect chains from computer programs , 2002, SIGSOFT '02/FSE-10.