Software-Defect Localisation by Mining Dataflow-Enabled Call Graphs

Defect localisation is essential in software engineering and is an important task in domain-specific data mining. Existing techniques building on call-graph mining can localise different kinds of defects. However, these techniques focus on defects that affect the controlflow and are agnostic regarding the dataflow. In this paper, we introduce dataflowenabled call graphs that incorporate abstractions of the dataflow. Building on these graphs, we present an approach for defect localisation. The creation of the graphs and the defect localisation are essentially data mining problems, making use of discretisation, frequent subgraph mining and feature selection. We demonstrate the defect-localisation qualities of our approach with a study on defects introduced into Weka. As a result, defect localisation now works much better, and a developer has to investigate on average only 1.5 out of 30 methods to fix a defect.

[1]  Ralf H. Reussner,et al.  Using Genetic Search for Reverse Engineering of Parametric Behavior Models for Performance Prediction , 2010, IEEE Transactions on Software Engineering.

[2]  Jiawei Han,et al.  Research Challenges for Data Mining in Science and Engineering , 2008, Next Generation of Data Mining.

[3]  Klemens Böhm,et al.  Software-Bug Localization with Graph Mining , 2010, Managing and Mining Graph Data.

[4]  Lukasz A. Kurgan,et al.  CAIM discretization algorithm , 2004, IEEE Transactions on Knowledge and Data Engineering.

[5]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[6]  Giuseppe Di Fatta,et al.  Discriminative pattern mining in software fault detection , 2006, SOQUA '06.

[7]  Jørgen Lindskov Knudsen ECOOP 2001 — Object-Oriented Programming , 2001, Lecture Notes in Computer Science.

[8]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[9]  Andreas Zeller,et al.  Mining metrics to predict component failures , 2006, ICSE.

[10]  Hong Cheng,et al.  Identifying bug signatures using discriminative graph mining , 2009, ISSTA.

[11]  Andreas Zeller,et al.  Why Programs Fail, Second Edition: A Guide to Systematic Debugging , 2009 .

[12]  Philip S. Yu,et al.  Mining significant graph patterns by leap search , 2008, SIGMOD Conference.

[13]  Mary Jean Harrold,et al.  Empirical evaluation of the tarantula automatic fault-localization technique , 2005, ASE.

[14]  Thomas Zimmermann,et al.  Extraction of bug localization benchmarks from history , 2007, ASE.

[15]  Klemens Böhm,et al.  Localizing Defects in Multithreaded Programs by Mining Dynamic Call Graphs , 2010, TAIC PART.

[16]  David Hovemeyer,et al.  Using Static Analysis to Find Bugs , 2008, IEEE Software.

[17]  Thomas J. Ostrand,et al.  Experiments on the effectiveness of dataflow- and control-flow-based test adequacy criteria , 1994, Proceedings of 16th International Conference on Software Engineering.

[18]  John T. Stasko,et al.  Visualization of test information to assist fault localization , 2002, ICSE '02.

[19]  Philip S. Yu,et al.  Next Generation of Data Mining , 2008, Chapman and Hall / CRC Data Mining and Knowledge Discovery Series.

[20]  Chao Liu,et al.  Mining Behavior Graphs for "Backtrace" of Noncrashing Bugs , 2005, SDM.

[21]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[22]  Andreas Zeller,et al.  Why Programs Fail: A Guide to Systematic Debugging , 2005 .

[23]  William G. Griswold,et al.  An Overview of AspectJ , 2001, ECOOP.

[24]  Chao Liu,et al.  SOBER: statistical model-based bug localization , 2005, ESEC/FSE-13.

[25]  Peter A. Flach,et al.  Evaluation Measures for Multi-class Subgroup Discovery , 2009, ECML/PKDD.

[26]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[27]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[28]  Klemens Böhm,et al.  Mining Edge-Weighted Call Graphs to Localise Software Bugs , 2008, ECML/PKDD.

[29]  Ian Witten,et al.  Data Mining , 2000 .

[30]  Charu C. Aggarwal,et al.  Managing and Mining Graph Data , 2010, Managing and Mining Graph Data.

[31]  Michael I. Jordan,et al.  Bug isolation via remote program sampling , 2003, PLDI '03.