A Systematic Study of Failure Proximity

Software end-users are the best testers, who keep revealing bugs in software that has undergone rigorous in-house testing. In order to leverage their testing efforts, failure reporting components have been widely deployed in released software. Many utilities of the collected failure data depend on an effective failure indexing technique, which, at the optimal case, would index all failures due to the same bug together. Unfortunately, the problem of failure proximity, which underpins the effectiveness of an indexing technique, has not been systematically studied. This article presents the first systematic study of failure proximity. A failure proximity consists of two components: a fingerprinting function that extracts signatures from failures, and a distance function that calculates the likelihood of two failures being due to the same bug. By considering different instantiations of the two functions, we study an array of six failure proximities (two of them are new) in this article. These proximities range from the simplest approach that checks failure points to the most sophisticated approach that utilizes fault localization algorithms to extract failure signatures. Besides presenting technical details of each proximity, we also study the properties of each proximity and tradeoffs between proximities. These altogether deliver a systematic view of failure proximity.

[1]  Alex Aiken,et al.  Cooperative Bug Isolation , 2007 .

[2]  Chao Liu,et al.  Statistical Debugging: A Hypothesis Testing-Based Approach , 2006, IEEE Transactions on Software Engineering.

[3]  Steven P. Reiss,et al.  Fault localization with nearest neighbor queries , 2003, 18th IEEE International Conference on Automated Software Engineering, 2003. Proceedings..

[4]  Gail C. Murphy,et al.  Who should fix this bug? , 2006, ICSE.

[5]  Mary Jean Harrold,et al.  Debugging in Parallel , 2007, ISSTA '07.

[6]  Per Runeson,et al.  Detection of Duplicate Defect Reports Using Natural Language Processing , 2007, 29th International Conference on Software Engineering (ICSE'07).

[7]  P. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 1999 .

[8]  Xiangyu Zhang,et al.  Pruning dynamic slices with confidence , 2006, PLDI '06.

[9]  J. Marden Analyzing and Modeling Rank Data , 1996 .

[10]  Alessandro Orso,et al.  Applying classification techniques to remotely-collected program execution data , 2005, ESEC/FSE-13.

[11]  David Leon,et al.  Tree-based methods for classifying software failures , 2004, 15th International Symposium on Software Reliability Engineering.

[12]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[13]  H. Cleve,et al.  Locating causes of program failures , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[14]  David Leon,et al.  Finding failures by cluster analysis of execution profiles , 2001, Proceedings of the 23rd International Conference on Software Engineering. ICSE 2001.

[15]  Gail C. Murphy,et al.  Automatic bug triage using text categorization , 2004, SEKE.

[16]  Xiangyu Zhang,et al.  Locating faulty code using failure-inducing chops , 2005, ASE.

[17]  R. Forthofer,et al.  Rank Correlation Methods , 1981 .

[18]  A. Zeller Isolating cause-effect chains from computer programs , 2002, SIGSOFT '02/FSE-10.

[19]  Michael I. Jordan,et al.  Bug isolation via remote program sampling , 2003, PLDI.

[20]  Bin Wang,et al.  Automated support for classifying software failure reports , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[21]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[22]  Robert O. Hastings,et al.  Fast detection of memory leaks and access errors , 1991 .

[23]  Yuanyuan Zhou,et al.  Have things changed now?: an empirical study of bug characteristics in modern open source software , 2006, ASID '06.

[24]  Joseph Robert Horgan,et al.  Dynamic program slicing , 1990, PLDI '90.

[25]  Michael I. Jordan,et al.  Scalable statistical bug isolation , 2005, PLDI '05.

[26]  Michael I. Jordan,et al.  Statistical debugging: simultaneous identification of multiple bugs , 2006, ICML '06.

[27]  Steven P. Reiss,et al.  A Research Framework for Software-Fault Localization Tools , 2005 .

[28]  Alessandro Orso,et al.  Classifying data dependences in the presence of pointers for program comprehension, testing, and debugging , 2004, TSEM.

[29]  Zhendong Su,et al.  HDD: hierarchical delta debugging , 2006, ICSE.

[30]  Gregg Rothermel,et al.  Supporting Controlled Experimentation with Testing Techniques: An Infrastructure and its Potential Impact , 2005, Empirical Software Engineering.

[31]  Joseph Robert Horgan,et al.  Incremental regression testing , 1993, 1993 Conference on Software Maintenance.

[32]  Chao Liu,et al.  How Bayesians Debug , 2006, Sixth International Conference on Data Mining (ICDM'06).

[33]  Chao Liu,et al.  SOBER: statistical model-based bug localization , 2005, ESEC/FSE-13.

[34]  Gail C. Murphy,et al.  Coping with an open bug repository , 2005, eclipse '05.

[35]  Xiangyu Zhang,et al.  Experimental evaluation of using dynamic slices for fault location , 2005, AADEBUG'05.

[36]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[37]  David Leon,et al.  Pursuing failure: the distribution of program failures in a profile space , 2001, ESEC/FSE-9.

[38]  M. Levandowsky,et al.  Distance between Sets , 1971, Nature.

[39]  Joachim M. Buhmann,et al.  Pairwise Data Clustering by Deterministic Annealing , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[40]  L. McLaughlin Automated bug tracking: the promise and the pitfalls , 2004, IEEE Software.

[41]  Alessandro Orso,et al.  A Technique for Enabling and Supporting Debugging of Field Failures , 2007, 29th International Conference on Software Engineering (ICSE'07).

[42]  Xiangyu Zhang,et al.  Locating faulty code by multiple points slicing , 2007, Softw. Pract. Exp..

[43]  Xiangyu Zhang,et al.  Whole Execution Traces , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[44]  Chao Liu,et al.  Failure proximity: a fault localization-based approach , 2006, SIGSOFT '06/FSE-14.

[45]  James M. Rehg,et al.  Active learning for automatic classification of software behavior , 2004, ISSTA '04.

[46]  Xiaojin Zhu,et al.  Statistical Debugging Using Latent Topic Models , 2007, ECML.

[47]  Mary Jean Harrold,et al.  Empirical evaluation of the tarantula automatic fault-localization technique , 2005, ASE.

[48]  Alessandro Orso,et al.  Techniques for Classifying Executions of Deployed Software to Support Software Engineering Tasks , 2007, IEEE Transactions on Software Engineering.

[49]  Joel Spolsky Get Crash Reports from Users-Automatically! , 2004 .

[50]  Patrick J. F. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 2003 .

[51]  Brad A. Myers,et al.  A Linguistic Analysis of How People Describe Software Problems , 2006, Visual Languages and Human-Centric Computing (VL/HCC'06).