论文信息 - Overfitting in semantics-based automated program repair

Overfitting in semantics-based automated program repair

The primary goal of Automated Program Repair (APR) is to automatically fix buggy software, to reduce the manual bug-fix burden that presently rests on human developers. Existing APR techniques can be generally divided into two families: semantics- vs. heuristics-based. Semantics-based APR uses symbolic execution and test suites to extract semantic constraints, and uses program synthesis to synthesize repairs that satisfy the extracted constraints. Heuristic-based APR generates large populations of repair candidates via source manipulation, and searches for the best among them. Both families largely rely on a primary assumption that a program is correctly patched if the generated patch leads the program to pass all provided test cases. Patch correctness is thus an especially pressing concern. A repair technique may generate overfitting patches, which lead a program to pass all existing test cases, but fails to generalize beyond them. In this work, we revisit the overfitting problem with a focus on semantics-based APR techniques, complementing previous studies of the overfitting problem in heuristics-based APR. We perform our study using IntroClass and Codeflaws benchmarks, two datasets well-suited for assessing repair quality, to systematically characterize and understand the nature of overfitting in semantics-based APR. We find that similar to heuristics-based APR, overfitting also occurs in semantics-based APR in various different ways.

[1] Westley Weimer,et al. Leveraging program equivalence for adaptive program repair: Models and first results , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[2] Claire Le Goues,et al. Automatic program repair with evolutionary computation , 2010, Commun. ACM.

[3] Viktor Kuncak,et al. Counterexample-Guided Quantifier Instantiation for Synthesis in SMT , 2015, CAV.

[4] Ivan Beschastnikh,et al. Synergizing Specification Miners through Model Fissions and Fusions (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[5] Abhik Roychoudhury,et al. Codeflaws: A Programming Competition Benchmark for Evaluating Automated Program Repair Tools , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C).

[6] Thomas Ball,et al. Modular and verified automatic program repair , 2012, OOPSLA '12.

[7] Michael D. Ernst,et al. Defects4J: a database of existing faults to enable controlled testing studies for Java programs , 2014, ISSTA 2014.

[8] David Lo,et al. Empirical Study on Synthesis Engines for Semantics-Based Program Repair , 2016, 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[9] Jaechang Nam,et al. Automatic patch generation learned from human-written patches , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[10] A.J.C. van Gemund,et al. On the Accuracy of Spectrum-based Fault Localization , 2007, Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION (TAICPART-MUTATION 2007).

[11] Rajeev Alur,et al. Syntax-guided synthesis , 2013, 2013 Formal Methods in Computer-Aided Design.

[12] Yuriy Brun,et al. Repairing Programs with Semantic Code Search (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[13] Fan Long,et al. An Analysis of the Search Spaces for Generate and Validate Patch Generation Systems , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[14] Roderick Bloem,et al. Automated error localization and correction for imperative programs , 2011, 2011 Formal Methods in Computer-Aided Design (FMCAD).

[15] Dawei Qi,et al. SemFix: Program repair via semantic analysis , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[16] Yuhua Qi,et al. The strength of random search on automated program repair , 2014, ICSE.

[17] Yuriy Brun,et al. Is the cure worse than the disease? overfitting in automated program repair , 2015, ESEC/SIGSOFT FSE.

[18] Dawson R. Engler,et al. KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[19] Martin Monperrus,et al. Nopol: Automatic Repair of Conditional Statement Bugs in Java Programs , 2018, IEEE Transactions on Software Engineering.

[20] Claire Le Goues,et al. A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[21] Westley Weimer,et al. A human study of patch maintainability , 2012, ISSTA 2012.

[22] David Lo,et al. Enhancing Automated Program Repair with Deductive Verification , 2016, 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[23] Gregory Tassey,et al. Prepared for what , 2007 .

[24] Claire Le Goues,et al. GenProg: A Generic Method for Automatic Software Repair , 2012, IEEE Transactions on Software Engineering.

[25] David Lo,et al. Active Semi-supervised Defect Categorization , 2015, 2015 IEEE 23rd International Conference on Program Comprehension.

[26] David Lo,et al. S3: syntax- and semantic-guided repair synthesis via programming by examples , 2017, ESEC/SIGSOFT FSE.

[27] Sumit Gulwani,et al. Oracle-guided component-based program synthesis , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[28] Fan Long,et al. Staged program repair with condition synthesis , 2015, ESEC/SIGSOFT FSE.

[29] David Lo,et al. Should fixing these failures be delegated to automated program repair? , 2015, 2015 IEEE 26th International Symposium on Software Reliability Engineering (ISSRE).

[30] Martin Monperrus,et al. Automatic repair of buggy if conditions and missing preconditions with SMT , 2014, CSTVA 2014.

[31] Fan Long,et al. Automatic patch generation by learning correct code , 2016, POPL.

[32] David Lo,et al. History Driven Program Repair , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[33] Claire Le Goues,et al. JFIX: semantics-based repair of Java programs via symbolic PathFinder , 2017, ISSTA.

[34] Abhik Roychoudhury,et al. Angelix: Scalable Multiline Program Patch Synthesis via Symbolic Analysis , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[35] Xuan-Bach D. Le,et al. Towards efficient and effective automatic program repair , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[36] Abhik Roychoudhury,et al. DirectFix: Looking for Simple Program Repairs , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[37] Matias Martinez,et al. Mining software repair models for reasoning on the search space of automated program fixing , 2013, Empirical Software Engineering.

[38] Roderick Bloem,et al. Repair with On-The-Fly Program Analysis , 2012, Haifa Verification Conference.

[39] Sumit Gulwani,et al. Programming by Examples - and its applications in Data Wrangling , 2016, Dependable Software Systems Engineering.

[40] Yuriy Brun,et al. The ManyBugs and IntroClass Benchmarks for Automated Repair of C Programs , 2015, IEEE Transactions on Software Engineering.

[41] Kathryn T. Stolee,et al. Repairing Programs with Semantic Code Search , 2015 .

[42] Fan Long,et al. An analysis of patch plausibility and correctness for generate-and-validate patch generation systems , 2015, ISSTA.