Failure proximity: a fault localization-based approach

Recent software systems usually feature an automated failure reporting system, with which a huge number of failing traces are collected every day. In order to prioritize fault diagnosis, failing traces due to the same fault are expected to be grouped together. Previous methods, by hypothesizing that similar failing traces imply the same fault, cluster failing traces based on the literal trace similarity, which we call trace proximity. However, since a fault can be triggered in many ways, failing traces due to the same fault can be quite different. Therefore, previous methods actually group together traces exhibiting similar behaviors, like similar branch coverage, rather than traces due to the same fault. In this paper, we propose a new type of failure proximity, called R-Proximity, which regards two failing traces as similar if they suggest roughly the same fault location. The fault location each failing case suggests is automatically obtained with Sober, an existing statistical debugging tool. We show that with R-Proximity, failing traces due to the same fault can be grouped together. In addition, we find that R-Proximity is helpful for statistical debugging: It can help developers interpret and utilize the statistical debugging result. We illustrate the usage of R-Proximity with a case study on the grep program and some experiments on the Siemens suite, and the result clearly demonstrates the advantage of R-Proximity over trace proximity.

[1]  Xiangyu Zhang,et al.  Locating faulty code using failure-inducing chops , 2005, ASE.

[2]  R. Forthofer,et al.  Rank Correlation Methods , 1981 .

[3]  A. Zeller Isolating cause-effect chains from computer programs , 2002, SIGSOFT '02/FSE-10.

[4]  Patrick J. F. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 2003 .

[5]  Alessandro Orso,et al.  Applying classification techniques to remotely-collected program execution data , 2005, ESEC/FSE-13.

[6]  Michael I. Jordan,et al.  Scalable statistical bug isolation , 2005, PLDI '05.

[7]  Michael I. Jordan,et al.  Bug isolation via remote program sampling , 2003, PLDI '03.

[8]  James M. Rehg,et al.  Active learning for automatic classification of software behavior , 2004, ISSTA '04.

[9]  Mary Jean Harrold,et al.  Empirical evaluation of the tarantula automatic fault-localization technique , 2005, ASE.

[10]  H. Cleve,et al.  Locating causes of program failures , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[11]  J. Marden Analyzing and Modeling Rank Data , 1996 .

[12]  David Leon,et al.  Finding failures by cluster analysis of execution profiles , 2001, Proceedings of the 23rd International Conference on Software Engineering. ICSE 2001.

[13]  David Leon,et al.  Pursuing failure: the distribution of program failures in a profile space , 2001, ESEC/FSE-9.

[14]  M. Kendall,et al.  Rank Correlation Methods , 1949 .

[15]  HarroldMary Jean,et al.  Active learning for automatic classification of software behavior , 2004 .

[16]  Chao Liu,et al.  SOBER: statistical model-based bug localization , 2005, ESEC/FSE-13.

[17]  Steven P. Reiss,et al.  Fault localization with nearest neighbor queries , 2003, 18th IEEE International Conference on Automated Software Engineering, 2003. Proceedings..

[18]  Bin Wang,et al.  Automated support for classifying software failure reports , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[19]  Zhendong Su,et al.  HDD: hierarchical delta debugging , 2006, ICSE.

[20]  Gregg Rothermel,et al.  Supporting Controlled Experimentation with Testing Techniques: An Infrastructure and its Potential Impact , 2005, Empirical Software Engineering.

[21]  Xiangyu Zhang,et al.  Locating faults through automated predicate switching , 2006, ICSE.