Will Fault Localization Work for These Failures? An Automated Approach to Predict Effectiveness of Fault Localization Tools

Debugging is a crucial yet expensive activity to improve the reliability of software systems. To reduce debugging cost, various fault localization tools have been proposed. A spectrum-based fault localization tool often outputs an ordered list of program elements sorted based on their likelihood to be the root cause of a set of failures (i.e., their suspiciousness scores). Despite the many studies on fault localization, unfortunately, however, for many bugs, the root causes are often low in the ordered list. This potentially causes developers to distrust fault localization tools. Recently, Parnin and Orso highlight in their user study that many debuggers do not find fault localization useful if they do not find the root cause early in the list. To alleviate the above issue, we build an oracle that could predict whether the output of a fault localization tool can be trusted or not. If the output is not likely to be trusted, developers do not need to spend time going through the list of most suspicious program elements one by one. Rather, other conventional means of debugging could be performed. To construct the oracle, we extract the values of a number of features that are potentially related to the effectiveness of fault localization. Building upon advances in machine learning, we process these feature values to learn a discriminative model that is able to predict the effectiveness of a fault localization tool output. In this preliminary work, we consider an output of a fault localization tool to be effective if the root cause appears in the top 10 most suspicious program elements. We have experimented our proposed oracle on 200 faulty programs from Space, NanoXML, XML-Security, and the 7 programs in Siemens test suite. Our experiments demonstrate that we could predict the effectiveness of fault localization tool with a precision, recall, and F-measure (harmonic mean of precision and recall) of 54.36%, 95.29%, and 69.23%. The numbers indicate that many ineffective fault localization instances are identified correctly, while only very few effective ones are identified wrongly.

[1]  Frank Tip,et al.  Directed test generation for effective fault localization , 2010, ISSTA '10.

[2]  Hong Cheng,et al.  Identifying bug signatures using discriminative graph mining , 2009, ISSTA.

[3]  Hong Cheng,et al.  Bug Signature Minimization and Fusion , 2011, 2011 IEEE 13th International Symposium on High-Assurance Systems Engineering.

[4]  Siau-Cheng Khoo,et al.  A discriminative model approach for accurate duplicate bug report retrieval , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[5]  James R. Larus,et al.  The use of program profiling for software maintenance with applications to the year 2000 problem , 1997, ESEC '97/FSE-5.

[6]  Gregg Rothermel,et al.  Supporting Controlled Experimentation with Testing Techniques: An Infrastructure and its Potential Impact , 2005, Empirical Software Engineering.

[7]  Steven P. Reiss,et al.  Fault localization with nearest neighbor queries , 2003, 18th IEEE International Conference on Automated Software Engineering, 2003. Proceedings..

[8]  David Lo,et al.  Improved Duplicate Bug Report Identification , 2012, 2012 16th European Conference on Software Maintenance and Reengineering.

[9]  Gregory Tassey,et al.  Prepared for what , 2007 .

[10]  Yann-Gaël Guéhéneuc,et al.  Support vector machines for anti-pattern detection , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[11]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[12]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[13]  Chao Liu,et al.  SOBER: statistical model-based bug localization , 2005, ESEC/FSE-13.

[14]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[15]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[16]  David Lo,et al.  Interactive fault localization leveraging simple user feedback , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[17]  Gregg Rothermel,et al.  An empirical investigation of the relationship between spectra differences and regression faults , 2000 .

[18]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[19]  David G. Stork,et al.  Pattern Classification , 1973 .

[20]  Michael I. Jordan,et al.  Bug isolation via remote program sampling , 2003, PLDI.

[21]  Sunghun Kim,et al.  Predicting recurring crash stacks , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[22]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[23]  David Lo,et al.  Comprehensive evaluation of association measures for fault localization , 2010, 2010 IEEE International Conference on Software Maintenance.

[24]  Raúl A. Santelices,et al.  Lightweight fault-localization using multiple coverage types , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[25]  Trishul M. Chilimbi,et al.  HOLMES: Effective statistical debugging via efficient path profiling , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[26]  Boris Beizer,et al.  Software testing techniques (2. ed.) , 1990 .

[27]  Xiangyu Zhang,et al.  Locating faults through automated predicate switching , 2006, ICSE.

[28]  David Lo,et al.  Search-based fault localization , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[29]  Ken-ichi Matsumoto,et al.  Studying re-opened bugs in open source software , 2012, Empirical Software Engineering.

[30]  Jiawei Han,et al.  Generalized Fisher Score for Feature Selection , 2011, UAI.

[31]  David Lo,et al.  Extracting Paraphrases of Technical Terms from Noisy Parallel Software Corpora , 2009, ACL.

[32]  Rajiv Gupta,et al.  Fault localization using value replacement , 2008, ISSTA '08.

[33]  A.J.C. van Gemund,et al.  On the Accuracy of Spectrum-based Fault Localization , 2007, Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION (TAICPART-MUTATION 2007).

[34]  Mary Jean Harrold,et al.  Empirical evaluation of the tarantula automatic fault-localization technique , 2005, ASE.

[35]  LiGuo Huang,et al.  AutoODC: Automated generation of orthogonal defect classifications , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[36]  Ferdian Thung,et al.  Automatic Defect Categorization , 2012, 2012 19th Working Conference on Reverse Engineering.

[37]  Jian Zhou,et al.  Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[38]  Thomas J. Ostrand,et al.  Experiments on the effectiveness of dataflow- and control-flow-based test adequacy criteria , 1994, Proceedings of 16th International Conference on Software Engineering.

[39]  Sigrid Eldh Software Testing Techniques , 2007 .

[40]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[41]  Gail C. Murphy,et al.  Who should fix this bug? , 2006, ICSE.

[42]  Alessandro Orso,et al.  Are automated debugging techniques actually helping programmers? , 2011, ISSTA '11.

[43]  Xiangyu Zhang,et al.  Locating faulty code using failure-inducing chops , 2005, ASE.

[44]  A. Zeller Isolating cause-effect chains from computer programs , 2002, SIGSOFT '02/FSE-10.

[45]  David Lo,et al.  Diversity maximization speedup for fault localization , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.