SOBER: statistical model-based bug localization

Automated localization of software bugs is one of the essential issues in debugging aids. Previous studies indicated that the evaluation history of program predicates may disclose important clues about underlying bugs. In this paper, we propose a new statistical model-based approach, called SOBER, which localizes software bugs without any prior knowledge of program semantics. Unlike existing statistical debugging approaches that select predicates correlated with program failures, SOBER models evaluation patterns of predicates in both correct and incorrect runs respectively and regards a predicate as bug-relevant if its evaluation pattern in incorrect runs differs significantly from that in correct ones. SOBER features a principled quantification of the pattern difference that measures the bug-relevance of program predicates.We systematically evaluated our approach under the same setting as previous studies. The result demonstrated the power of our approach in bug localization: SOBER can help programmers locate 68 out of 130 bugs in the Siemens suite when programmers are expected to examine no more than 10% of the code, whereas the best previously reported is 52 out of 130. Moreover, with the assistance of SOBER, we found two bugs in bc 1.06 (an arbitrary precision calculator on UNIX/Linux), one of which has never been reported before.

[1]  Andrew D. Booth,et al.  A SIGNED BINARY MULTIPLICATION TECHNIQUE , 1951 .

[2]  E. Lehmann Testing Statistical Hypotheses , 1960 .

[3]  O. L. Macsorley High-Speed Arithmetic in Binary Computers , 1961, Proceedings of the IRE.

[4]  J. Quisquater,et al.  Fast decipherment algorithm for RSA public-key cryptosystem , 1982 .

[5]  Chris J. Mitchell,et al.  Algorithms for software implementations of RSA , 1989 .

[6]  G. Casella,et al.  Statistical Inference , 2003, Encyclopedia of Social Network Analysis and Mining.

[7]  Thomas J. Ostrand,et al.  Experiments on the effectiveness of dataflow- and control-flow-based test adequacy criteria , 1994, Proceedings of 16th International Conference on Software Engineering.

[8]  James R. Larus,et al.  The use of program profiling for software maintenance with applications to the year 2000 problem , 1997, ESEC '97/FSE-5.

[9]  Ernst-Rüdiger Olderog,et al.  Verification of Sequential and Concurrent Programs , 1997, Graduate Texts in Computer Science.

[10]  Gregg Rothermel,et al.  Empirical Studies of a Safe Regression Test Selection Technique , 1998, IEEE Trans. Software Eng..

[11]  Anish Arora,et al.  Book Review: Verification of Sequential and Concurrent Programs by Krzysztof R. Apt and Ernst-Riidiger Olderog (Springer-Verlag New York, 1997) , 1998, SIGA.

[12]  Graham A. Jullien,et al.  An Algorithm for Modular Exponentiation , 1998, Inf. Process. Lett..

[13]  William G. Griswold,et al.  Dynamically discovering likely program invariants to support program evolution , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[14]  Gregg Rothermel,et al.  An empirical investigation of the relationship between spectra differences and regression faults , 2000 .

[15]  Klaus Havelund,et al.  Model checking programs , 2000, Proceedings ASE 2000. Fifteenth IEEE International Conference on Automated Software Engineering.

[16]  Stephan Merz,et al.  Model Checking , 2000 .

[17]  Andreas Zeller,et al.  Visualizing Memory Graphs , 2001, Software Visualization.

[18]  Andy Chou,et al.  Bugs as Inconsistent Behavior: A General Approach to Inferring Errors in Systems Code. , 2001, SOSP 2001.

[19]  Dawson R. Engler,et al.  Bugs as deviant behavior: a general approach to inferring errors in systems code , 2001, SOSP.

[20]  David Leon,et al.  Finding failures by cluster analysis of execution profiles , 2001, Proceedings of the 23rd International Conference on Software Engineering. ICSE 2001.

[21]  D. Engler,et al.  CMC: a pragmatic approach to model checking real code , 2002, OSDI '02.

[22]  A. Zeller Isolating cause-effect chains from computer programs , 2002, SIGSOFT '02/FSE-10.

[23]  John T. Stasko,et al.  Visualization of test information to assist fault localization , 2002, ICSE '02.

[24]  M. Lam,et al.  Tracking down software bugs using automatic anomaly detection , 2002, Proceedings of the 24th International Conference on Software Engineering. ICSE 2002.

[25]  Shriram Krishnamurthi,et al.  Automated Fault Localization Using Potential Invariants , 2003, ArXiv.

[26]  Michael I. Jordan,et al.  Bug isolation via remote program sampling , 2003, PLDI.

[27]  Steven P. Reiss,et al.  Fault localization with nearest neighbor queries , 2003, 18th IEEE International Conference on Automated Software Engineering, 2003. Proceedings..

[28]  Shriram Krishnamurthi,et al.  Automated Fault Localization Using Potential Invariants 1 , 2003 .

[29]  Stephen McCamant,et al.  Predicting problems caused by component upgrades , 2003, ESEC/FSE-11.

[30]  Bin Wang,et al.  Automated support for classifying software failure reports , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[31]  Yuriy Brun,et al.  Finding latent code errors via machine learning over program executions , 2004, Proceedings. 26th International Conference on Software Engineering.

[32]  Michael I. Jordan,et al.  Scalable statistical bug isolation , 2005, PLDI '05.

[33]  Andreas Zeller,et al.  Locating causes of program failures , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..