A genetic programming approach to automated software repair

Genetic programming is combined with program analysis methods to repair bugs in off-the-shelf legacy C programs. Fitness is defined using negative test cases that exercise the bug to be repaired and positive test cases that encode program requirements. Once a successful repair is discovered, structural differencing algorithms and delta debugging methods are used to minimize its size. Several modifications to the GP technique contribute to its success: (1) genetic operations are localized to the nodes along the execution path of the negative test case; (2) high-level statements are represented as single nodes in the program tree; (3) genetic operators use existing code in other parts of the program, so new code does not need to be invented. The paper describes the method, reviews earlier experiments that repaired 11 bugs in over 60,000 lines of code, reports results on new bug repairs, and describes experiments that analyze the performance and efficacy of the evolutionary components of the algorithm.

[1]  Claire Le Goues,et al.  Automatically finding patches using genetic programming , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[2]  Mayur Naik,et al.  From symptom to cause: localizing errors in counterexample traces , 2003, POPL '03.

[3]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[4]  Thomas M. Pigoski Practical Software Maintenance: Best Practices for Managing Your Software Investment , 1996 .

[5]  Robert V. Binder,et al.  Testing Object-Oriented Systems: Models, Patterns, and Tools , 1999 .

[6]  Stephen McCamant,et al.  Inference and enforcement of data structure consistency specifications , 2006, ISSTA '06.

[7]  Sriram K. Rajamani,et al.  Automatically validating temporal safety properties of interfaces , 2001, SPIN '01.

[8]  George C. Necula,et al.  CIL: Intermediate Language and Tools for Analysis and Transformation of C Programs , 2002, CC.

[9]  Michael I. Jordan,et al.  Bug isolation via remote program sampling , 2003, PLDI.

[10]  Andrea Arcuri,et al.  On the automation of fixing software bugs , 2008, ICSE Companion '08.

[11]  Michael Burrows,et al.  Eraser: a dynamic data race detector for multithreaded programs , 1997, TOCS.

[12]  David H. Ackley,et al.  Modularity and the evolution of software evolvability , 2004 .

[13]  Xin Yao,et al.  A novel co-evolutionary approach to automatic software bug fixing , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[14]  Terry Van Belle,et al.  Code Factoring And The Evolution Of Evolvability , 2002, GECCO.

[15]  Zhendong Su,et al.  Symbolic mining of temporal specifications , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[16]  Stephanie Forrest,et al.  A sense of self for Unix processes , 1996, Proceedings 1996 IEEE Symposium on Security and Privacy.

[17]  Westley Weimer,et al.  Patches as better bug reports , 2006, GPCE '06.

[18]  Andreas Zeller,et al.  Yesterday, my program worked. Today, it does not. Why? , 1999, ESEC/FSE-7.

[19]  Andreas Zeller,et al.  Simplifying and Isolating Failure-Inducing Input , 2002, IEEE Trans. Software Eng..

[20]  John A. Clark,et al.  Multi-objective Improvement of Software Using Co-evolution and Smart Seeding , 2008, SEAL.

[21]  Martin Rinard,et al.  Automatic Data Structure Repair for Self-Healing Systems , 2003 .

[22]  Mary Lou Soffa,et al.  TimeAware test suite prioritization , 2006, ISSTA '06.

[23]  Olga Baysal,et al.  diffX: an algorithm to detect changes in multi-version XML documents , 2005, CASCON.