Mining Edge-Weighted Call Graphs to Localise Software Bugs

An important problem in software engineering is the automated discovery of noncrashing occasional bugs. In this work we address this problem and show that mining of weighted call graphs of program executions is a promising technique. We mine weighted graphs with a combination of structural and numerical techniques. More specifically, we propose a novel reduction technique for call graphs which introduces edge weights. Then we present an analysis technique for such weighted call graphs based on graph mining and on traditional feature selection schemes. The technique generalises previous graph mining approaches as it allows for an analysis of weights. Our evaluation shows that our approach finds bugs which previous approaches cannot detect so far. Our technique also doubles the precision of finding bugs which existing techniques can already localise in principle.

[1]  Michael I. Jordan,et al.  Bug isolation via remote program sampling , 2003, PLDI.

[2]  Christian Borgelt,et al.  Mining molecular fragments: finding relevant substructures of molecules , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[3]  John T. Stasko,et al.  Technical note: visually encoding program test information to find faults in software , 2001, IEEE Symposium on Information Visualization, 2001. INFOVIS 2001..

[4]  Chao Liu,et al.  Mining Control Flow Abnormality for Logic Error Isolation , 2006, SDM.

[5]  Klemens Böhm,et al.  Improved Software Fault Detection with Graph Mining , 2008, MLG 2008.

[6]  Ian Witten,et al.  Data Mining , 2000 .

[7]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[8]  Chris Clifton,et al.  Knowledge discovery from transportation network data , 2005, 21st International Conference on Data Engineering (ICDE'05).

[9]  Hiroki Arimura,et al.  Efficient Substructure Discovery from Large Semi-Structured Data , 2001, IEICE Trans. Inf. Syst..

[10]  Sebastian Nowozin,et al.  Weighted Substructure Mining for Image Analysis , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Giuseppe Di Fatta,et al.  Discriminative pattern mining in software fault detection , 2006, SOQUA '06.

[12]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[13]  Thomas J. Ostrand,et al.  Experiments on the effectiveness of dataflow- and control-flow-based test adequacy criteria , 1994, Proceedings of 16th International Conference on Software Engineering.

[14]  Rajiv Gupta,et al.  A methodology for controlling the size of a test suite , 1990, Proceedings. Conference on Software Maintenance 1990.

[15]  Andreas Zeller,et al.  Mining metrics to predict component failures , 2006, ICSE.

[16]  Abraham Bernstein,et al.  Predicting defect densities in source code files with decision tree learners , 2006, MSR '06.

[17]  Andreas Zeller,et al.  Predicting component failures at design time , 2006, ISESE '06.

[18]  Janusz W. Laski,et al.  Dynamic Program Slicing , 1988, Inf. Process. Lett..

[19]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[20]  Chao Liu,et al.  Mining Behavior Graphs for "Backtrace" of Noncrashing Bugs , 2005, SDM.

[21]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[22]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[23]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[24]  Frank Klawonn,et al.  Sequence Mining for Customer Behaviour Predictions in Telecommunications , 2006 .

[25]  Joost N. Kok,et al.  A quickstart in frequent structure mining can make a difference , 2004, KDD.

[26]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[27]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[28]  Hiroki Arimura,et al.  Optimized Substructure Discovery for Semi-structured Data , 2002, PKDD.