Has the bug really been fixed?

Software has bugs, and fixing those bugs pervades the software engineering process. It is folklore that bug fixes are often buggy themselves, resulting in bad fixes, either failing to fix a bug or creating new bugs. To confirm this folklore, we explored bug databases of the Ant, AspectJ, and Rhino projects, and found that bad fixes comprise as much as 9% of all bugs. Thus, detecting and correcting bad fixes is important for improving the quality and reliability of software. However, no prior work has systematically considered this bad fix problem, which this paper introduces and formalizes. In particular, the paper formalizes two criteria to determine whether a fix resolves a bug: coverage and disruption. The coverage of a fix measures the extent to which the fix correctly handles all inputs that may trigger a bug, while disruption measures the deviations from the program's intended behavior after the application of a fix. This paper also introduces a novel notion of distance-bounded weakest precondition as the basis for the developed practical techniques to compute the coverage and disruption of a fix. To validate our approach, we implemented Fixation, a prototype that automatically detects bad fixes for Java programs. When it detects a bad fix, Fixation returns an input that still triggers the bug or reports a newly introduced bug. Programmers can then use that bug-triggering input to refine or reformulate their fix. We manually extracted fixes drawn from real-world projects and evaluated Fixation against them: Fixation successfully detected the extracted bad fixes.

[1]  Alessandro Orso,et al.  Test-Suite Augmentation for Evolving Software , 2008, 2008 23rd IEEE/ACM International Conference on Automated Software Engineering.

[2]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[3]  Scott N. Woodfield,et al.  Evaluating the effectiveness of reliability-assurance techniques , 1989, J. Syst. Softw..

[4]  Greg Nelson,et al.  Extended static checking for Java , 2002, PLDI '02.

[5]  Thomas A. Henzinger,et al.  Lazy abstraction , 2002, POPL '02.

[6]  Marcelo d'Amorim,et al.  Delta Execution for Efficient State-Space Exploration of Object-Oriented Programs , 2007, IEEE Transactions on Software Engineering.

[7]  Koushik Sen,et al.  DART: directed automated random testing , 2005, PLDI '05.

[8]  C. Csallner,et al.  Check 'n' crash: combining static checking and testing , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[9]  Gregg Rothermel,et al.  A safe, efficient regression test selection technique , 1997, TSEM.

[10]  Michael R. Lowry,et al.  Combining unit-level symbolic execution and system-level concrete execution for testing nasa software , 2008, ISSTA '08.

[11]  Sunghun Kim,et al.  Memories of bug fixes , 2006, SIGSOFT '06/FSE-14.

[12]  Sriram K. Rajamani,et al.  The SLAM project: debugging system software via static analysis , 2002, POPL '02.

[13]  Sarfraz Khurshid,et al.  Test input generation with java PathFinder , 2004, ISSTA '04.

[14]  Gregg Rothermel,et al.  An empirical study of regression test selection techniques , 2001, ACM Trans. Softw. Eng. Methodol..

[15]  Gregg Rothermel,et al.  Analyzing Regression Test Selection Techniques , 1996, IEEE Trans. Software Eng..

[16]  Manu Sridharan,et al.  Snugglebug: a powerful approach to weakest preconditions , 2009, PLDI '09.

[17]  Matthew B. Dwyer,et al.  Differential symbolic execution , 2008, SIGSOFT '08/FSE-16.

[18]  Andreas Zeller,et al.  Predicting Effort to Fix Software Bugs , 2007, Softwaretechnik-Trends.

[19]  Thomas A. Henzinger,et al.  Software Verification with BLAST , 2003, SPIN.

[20]  Alessandro Orso,et al.  Regression test selection for Java software , 2001, OOPSLA '01.

[21]  Neelam Gupta,et al.  Automated Debugging Using Path-Based Weakest Preconditions , 2004, FASE.

[22]  Barbara G. Ryder,et al.  Tool Support for Change-Centric Test Development , 2010, IEEE Software.

[23]  Alessandro Orso,et al.  Scaling regression testing to large software systems , 2004, SIGSOFT '04/FSE-12.

[24]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[25]  Yannis Smaragdakis,et al.  DSD-Crasher: A hybrid analysis tool for bug finding , 2006, TSEM.

[26]  James C. King,et al.  Symbolic execution and program testing , 1976, CACM.

[27]  Frank Tip,et al.  Change impact analysis for object-oriented programs , 2001, PASTE '01.

[28]  Andreas Zeller,et al.  Predicting faults from cached history , 2008, ISEC '08.

[29]  Edsger W. Dijkstra,et al.  A Discipline of Programming , 1976 .

[30]  Frank Tip,et al.  Finding bugs efficiently with a SAT solver , 2007, ESEC-FSE '07.

[31]  Gail C. Murphy,et al.  Who should fix this bug? , 2006, ICSE.

[32]  Thomas A. Henzinger,et al.  Generating tests from counterexamples , 2004, Proceedings. 26th International Conference on Software Engineering.

[33]  Stephen McCamant,et al.  Predicting problems caused by component upgrades , 2003, ESEC/FSE-11.

[34]  Koushik Sen,et al.  CUTE: a concolic unit testing engine for C , 2005, ESEC/FSE-13.

[35]  Andreas Zeller,et al.  When do changes induce fixes? , 2005, ACM SIGSOFT Softw. Eng. Notes.