Software Behavior and Failure Clustering: An Empirical Study of Fault Causality

To cluster executions that exhibit faulty behavior by the faults that cause them, researchers have proposed using internal execution events, such as statement profiles, to (1) measure execution similarities, (2) categorize executions based on those similarity results, and (3) suggest the resulting categories as sets of executions exhibiting uniform fault behavior. However, due to a paucity of evidence correlating profiles and output behavior, researchers employ multiple simplifying assumptions in order to justify such approaches. In this paper we present an empirical study of profile correlation with output behavior, and we reexamine the suitability of such simplifying assumptions. We examine over 4 billion test-case outputs and execution profiles from multiple programs with over 9000 versions. Our data provides evidence that with current techniques many executions should be omitted from the clustering analysis to provide clusters that each represent a single fault. In addition, our data reveals the previously undocumented effects of multiple faults on failures, which has implications for techniques' ability (and inability) to properly cluster. Our results suggest directions for the improvement of future failure-clustering techniques that better account for software-fault behavior.

[1]  Michael I. Jordan,et al.  Statistical debugging: simultaneous identification of multiple bugs , 2006, ICML.

[2]  Chao Liu,et al.  SOBER: statistical model-based bug localization , 2005, ESEC/FSE-13.

[3]  James H. Andrews,et al.  Evaluating the Accuracy of Fault Localization Techniques , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[4]  Charles Yang,et al.  Partition testing, stratified sampling, and cluster analysis , 1993, SIGSOFT '93.

[5]  James A. Jones,et al.  On the influence of multiple faults on coverage-based fault localization , 2011, ISSTA '11.

[6]  W. Eric Wong,et al.  Insights on Fault Interference for Programs with Multiple Bugs , 2009, 2009 20th International Symposium on Software Reliability Engineering.

[7]  Michael I. Jordan,et al.  Scalable statistical bug isolation , 2005, PLDI '05.

[8]  James A. Jones,et al.  Fault interaction and its repercussions , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[9]  Gregg Rothermel,et al.  An experimental determination of sufficient mutant operators , 1996, TSEM.

[10]  Gregg Rothermel,et al.  Infrastructure support for controlled experimentation with software testing and regression testing techniques , 2004, Proceedings. 2004 International Symposium on Empirical Software Engineering, 2004. ISESE '04..

[11]  Bin Wang,et al.  Automated support for classifying software failure reports , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[12]  Chetna Gupta,et al.  Complexity Estimation Approach for Debugging in Parallel , 2010, 2010 Second International Conference on Computer Research and Development.

[13]  Chao Liu,et al.  Failure proximity: a fault localization-based approach , 2006, SIGSOFT '06/FSE-14.

[14]  John T. Stasko,et al.  Visualization of test information to assist fault localization , 2002, ICSE '02.

[15]  Lee J. White,et al.  Multivariate visualization in observation-based testing , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[16]  Scott N. Woodfield,et al.  Evaluating the effectiveness of reliability-assurance techniques , 1989, J. Syst. Softw..

[17]  Gregg Rothermel,et al.  An empirical investigation of program spectra , 1998, PASTE '98.

[18]  David Leon,et al.  Pursuing failure: the distribution of program failures in a profile space , 2001, ESEC/FSE-9.

[19]  David Leon,et al.  Finding failures by cluster analysis of execution profiles , 2001, Proceedings of the 23rd International Conference on Software Engineering. ICSE 2001.

[20]  Mary Jean Harrold,et al.  Debugging in Parallel , 2007, ISSTA '07.

[21]  James M. Rehg,et al.  Active learning for automatic classification of software behavior , 2004, ISSTA '04.

[22]  Mary Jean Harrold,et al.  Empirical evaluation of the tarantula automatic fault-localization technique , 2005, ASE.

[23]  Charles Yang,et al.  Estimation of software reliability by stratified sampling , 1999, TSEM.