Constrained feature selection for localizing faults

Developers often take much time and effort to find buggy program elements. To help developers debug, many past studies have proposed spectrum-based fault localization techniques. These techniques compare and contrast correct and faulty execution traces and highlight suspicious program elements. In this work, we propose constrained feature selection algorithms that we use to localize faults. Feature selection algorithms are commonly used to identify important features that are helpful for a classification task. By mapping an execution trace to a classification instance and a program element to a feature, we can transform fault localization to the feature selection problem. Unfortunately, existing feature selection algorithms do not perform too well, and we extend its performance by adding a constraint to the feature selection formulation based on a specific characteristic of the fault localization problem. We have performed experiments on a popular benchmark containing 154 faulty versions from 8 programs and demonstrate that several variants of our approach can outperform many fault localization techniques proposed in the literature. Using Wilcoxon rank-sum test and Cliff's d effect size, we also show that the improvements are both statistically significant and substantial.

[1]  Martin Monperrus,et al.  Learning to Combine Multiple Ranking Metrics for Fault Localization , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[2]  Sarfraz Khurshid,et al.  Software fault localization using feature selection , 2011, MALETS '11.

[3]  Alessandro Orso,et al.  Are automated debugging techniques actually helping programmers? , 2011, ISSTA '11.

[4]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[5]  Mary Jean Harrold,et al.  Empirical evaluation of the tarantula automatic fault-localization technique , 2005, ASE.

[6]  David Lo,et al.  Comprehensive evaluation of association measures for fault localization , 2010, 2010 IEEE International Conference on Software Maintenance.

[7]  Baowen Xu,et al.  A theoretical analysis of the risk evaluation formulas for spectrum-based fault localization , 2013, TSEM.

[8]  N. Cliff Dominance statistics: Ordinal analyses to answer ordinal questions. , 1993 .

[9]  David Lo,et al.  Theory and Practice, Do They Match? A Case with Spectrum-Based Fault Localization , 2013, 2013 IEEE International Conference on Software Maintenance.

[10]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[11]  Jian Zhou,et al.  Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[12]  A.J.C. van Gemund,et al.  On the Accuracy of Spectrum-based Fault Localization , 2007, Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION (TAICPART-MUTATION 2007).

[13]  David Lo,et al.  Fusion fault localizers , 2014, ASE.

[14]  David Lo,et al.  Information retrieval and spectrum based bug localization: better together , 2015, ESEC/SIGSOFT FSE.