An analysis of the suitability of test-based patch acceptance criteria

Program repair techniques attempt to fix programs by looking for patches within a search space of fix candidates. These techniques require a specification of the program to be repaired, used as an acceptance criterion for fix candidates, that often also plays an important role in guiding some search processes. Most tools use tests as specifications, which constitutes a risk, since the incompleteness of tests as specifications may lead one to obtain spurious repairs, that pass all tests but are in fact incorrect. This problem has been identified by various researchers, raising concerns about the validity of program fixes. More thorough studies have been proposed using different sets of tests for fix validation, and resorting to manual inspection, showing that while tools reduce their program fixing rate, they are still able to repair a significant number of cases. In this paper, we perform a different analysis of the suitability of tests as acceptance criteria for automated program fixes, by checking patches produced by automated repair tools using a bug-finding tool, as opposed to previous works that used tests or manual inspections. We develop a number of experiments in which faulty programs from a known benchmark are fed to the program repair tools GenProg, Angelix, AutoFix and Nopol, using test suites of varying quality and extension, including those accompanying the benchmark. We then check the produced patches against formal specifications using a bug-finding tool. Our results show that, in general, automated program repair tools are significantly more likely to accept a spurious program fix than producing an actual one, in the studied scenarios.

[1]  Xin Yao,et al.  A novel co-evolutionary approach to automatic software bug fixing , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[2]  Yuriy Brun,et al.  Repairing Programs with Semantic Code Search (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[3]  Alessandro Orso,et al.  MintHint: automated synthesis of repair hints , 2013, ICSE.

[4]  Yuriy Brun,et al.  Is the cure worse than the disease? overfitting in automated program repair , 2015, ESEC/SIGSOFT FSE.

[5]  Roderick Bloem,et al.  Finding and Fixing Faults , 2005, CHARME.

[6]  W. Eric Wong,et al.  Using Mutation to Automatically Suggest Fixes for Faulty Programs , 2010, 2010 Third International Conference on Software Testing, Verification and Validation.

[7]  Abhik Roychoudhury,et al.  DirectFix: Looking for Simple Program Repairs , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[8]  Yuriy Brun,et al.  The ManyBugs and IntroClass Benchmarks for Automated Repair of C Programs , 2015, IEEE Transactions on Software Engineering.

[9]  Fan Long,et al.  An analysis of patch plausibility and correctness for generate-and-validate patch generation systems , 2015, ISSTA.

[10]  Abhik Roychoudhury,et al.  Angelix: Scalable Multiline Program Patch Synthesis via Symbolic Analysis , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[11]  Claire Le Goues,et al.  GenProg: A Generic Method for Automatic Software Repair , 2012, IEEE Transactions on Software Engineering.

[12]  Manuel Fähndrich,et al.  Static Verification for Code Contracts , 2010, SAS.

[13]  Jaechang Nam,et al.  Automatic patch generation learned from human-written patches , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[14]  Claire Le Goues,et al.  Automatic program repair with evolutionary computation , 2010, Commun. ACM.

[15]  Martin Monperrus,et al.  IntroClassJava: A Benchmark of 297 Small and Buggy Java Programs , 2016 .

[16]  Fan Long,et al.  Staged program repair with condition synthesis , 2015, ESEC/SIGSOFT FSE.

[17]  Claire Le Goues,et al.  Automatically finding patches using genetic programming , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[18]  Carlo A. Furia,et al.  Automatic C to O-O Translation with C2Eiffel , 2012, 2012 19th Working Conference on Reverse Engineering.

[19]  Bertrand Meyer,et al.  Reconciling Manual and Automated Testing: The AutoTest Experience , 2007, 2007 40th Annual Hawaii International Conference on System Sciences (HICSS'07).

[20]  Fan Long,et al.  An Analysis of the Search Spaces for Generate and Validate Patch Generation Systems , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[21]  Daan Leijen,et al.  Integrating a set of contract checking tools into Visual Studio , 2012, 2012 Second International Workshop on Developing Tools as Plug-Ins (TOPI).

[22]  Nikolai Tillmann,et al.  Pex-White Box Test Generation for .NET , 2008, TAP.

[23]  Dawson R. Engler,et al.  KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[24]  Michael D. Ernst,et al.  An overview of JML tools and applications , 2003, International Journal on Software Tools for Technology Transfer.