Statistical software debugging

Statistical debugging is a combination of statistical machine learning and software debugging. Given sampled run-time profiles from both successful and failed runs, our task is to select a small set of program predicates that can succinctly capture the failure modes, thereby leading to the locations of the bugs. Given the diverse nature of software bugs and coding structure, this is not a trivial task. We start by assuming that there is only one bug in the program. This allows us to concentrate on the problem of non-deterministic bugs. We design a utility function whose components may be adjusted based on the suspected level of determinism of the bug. The algorithm proves to work well on two real world programs. The problems becomes much more complicated once we do away with the single-bug assumption. The original single-bug algorithm does not perform well in the presence of multiple bugs. Our initial attempts at clustering fall short of an effective solution. After identifying the main problems in the multi-bug case, we present an iterative predicate scoring algorithm. We demonstrate the algorithm at work on five real world programs, where it successfully clusters runs and identifies important predicates that clearly point to many of the underlying bugs.

[1]  Sebastian Elbaum,et al.  Deploying Instrumented Software to Assist the Testing Activity , 2003 .

[2]  Ben Liblit,et al.  Public deployment of cooperative bug isolation , 2004, ICSE 2004.

[3]  Stephen E. Fienberg,et al.  Testing Statistical Hypotheses , 2005 .

[4]  Sudheendra Hangal,et al.  Tracking down software bugs using automatic anomaly detection , 2002, ICSE '02.

[5]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[6]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[7]  Michael I. Jordan,et al.  Bug isolation via remote program sampling , 2003, PLDI '03.

[8]  Alessandro Orso,et al.  Monitoring deployed software using software tomography , 2002, PASTE '02.

[9]  Premkumar T. Devanbu,et al.  Static checking of dynamically generated queries in database applications , 2004, Proceedings. 26th International Conference on Software Engineering.

[10]  Andreas Zeller,et al.  Isolating cause-effect chains from computer programs , 2002, SIGSOFT FSE.

[11]  Bin Wang,et al.  Automated support for classifying software failure reports , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[12]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[13]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[14]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[15]  David Haussler,et al.  Quantifying Inductive Bias: AI Learning Algorithms and Valiant's Learning Framework , 1988, Artif. Intell..

[16]  Michael I. Jordan,et al.  Variational probabilistic inference and the QMR-DT database , 1998 .

[17]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[18]  Greg Nelson,et al.  Extended static checking for Java , 2002, PLDI '02.

[19]  Thomas A. Henzinger,et al.  Lazy abstraction , 2002, POPL '02.

[20]  Chao Liu,et al.  SOBER: statistical model-based bug localization , 2005, ESEC/FSE-13.

[21]  Andreas Zeller,et al.  Simplifying and Isolating Failure-Inducing Input , 2002, IEEE Trans. Software Eng..

[22]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[23]  J. Friedman,et al.  Clustering objects on subsets of attributes (with discussion) , 2004 .

[24]  Daniel Shawcross Wilkerson,et al.  Winnowing: local algorithms for document fingerprinting , 2003, SIGMOD '03.

[25]  Eric Horvitz,et al.  Structure and chance: melding logic and probability for software debugging , 1995, CACM.

[26]  David A. Wagner,et al.  Finding User/Kernel Pointer Bugs with Type Inference , 2004, USENIX Security Symposium.

[27]  William G. Griswold,et al.  Dynamically discovering likely program invariants to support program evolution , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[28]  J. Hiriart-Urruty,et al.  Convex analysis and minimization algorithms , 1993 .

[29]  John Shawe-Taylor,et al.  The Set Covering Machine , 2003, J. Mach. Learn. Res..

[30]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[31]  Alex Aiken,et al.  Cooperative Bug Isolation , 2007 .

[32]  R. Tibshirani,et al.  Clustering methods for the analysis of DNA microarray data , 1999 .

[33]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[34]  Michael I. Jordan,et al.  Scalable statistical bug isolation , 2005, PLDI '05.

[35]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[36]  Alessandro Orso,et al.  Leveraging field data for impact analysis and regression testing , 2003, ESEC/FSE-11.

[37]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..