Overfitting in semantics-based automated program repair

The primary goal of Automated Program Repair (APR) is to automatically fix buggy software, to reduce the manual bug-fix burden that presently rests on human developers. Existing APR techniques can be generally divided into two families: semantics- vs. heuristics-based. Semantics-based APR uses symbolic execution and test suites to extract semantic constraints, and uses program synthesis to synthesize repairs that satisfy the extracted constraints. Heuristic-based APR generates large populations of repair candidates via source manipulation, and searches for the best among them. Both families largely rely on a primary assumption that a program is correctly patched if the generated patch leads the program to pass all provided test cases. Patch correctness is thus an especially pressing concern. A repair technique may generate overfitting patches, which lead a program to pass all existing test cases, but fails to generalize beyond them. In this work, we revisit the overfitting problem with a focus on semantics-based APR techniques, complementing previous studies of the overfitting problem in heuristics-based APR. We perform our study using IntroClass and Codeflaws benchmarks, two datasets well-suited for assessing repair quality, to systematically characterize and understand the nature of overfitting in semantics-based APR. We find that similar to heuristics-based APR, overfitting also occurs in semantics-based APR in various different ways.

[1]  Westley Weimer,et al.  Leveraging program equivalence for adaptive program repair: Models and first results , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[2]  Claire Le Goues,et al.  Automatic program repair with evolutionary computation , 2010, Commun. ACM.

[3]  Viktor Kuncak,et al.  Counterexample-Guided Quantifier Instantiation for Synthesis in SMT , 2015, CAV.

[4]  Ivan Beschastnikh,et al.  Synergizing Specification Miners through Model Fissions and Fusions (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[5]  Abhik Roychoudhury,et al.  Codeflaws: A Programming Competition Benchmark for Evaluating Automated Program Repair Tools , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C).

[6]  Thomas Ball,et al.  Modular and verified automatic program repair , 2012, OOPSLA '12.

[7]  Michael D. Ernst,et al.  Defects4J: a database of existing faults to enable controlled testing studies for Java programs , 2014, ISSTA 2014.

[8]  David Lo,et al.  Empirical Study on Synthesis Engines for Semantics-Based Program Repair , 2016, 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[9]  Jaechang Nam,et al.  Automatic patch generation learned from human-written patches , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[10]  A.J.C. van Gemund,et al.  On the Accuracy of Spectrum-based Fault Localization , 2007, Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION (TAICPART-MUTATION 2007).

[11]  Rajeev Alur,et al.  Syntax-guided synthesis , 2013, 2013 Formal Methods in Computer-Aided Design.

[12]  Yuriy Brun,et al.  Repairing Programs with Semantic Code Search (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[13]  Fan Long,et al.  An Analysis of the Search Spaces for Generate and Validate Patch Generation Systems , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[14]  Roderick Bloem,et al.  Automated error localization and correction for imperative programs , 2011, 2011 Formal Methods in Computer-Aided Design (FMCAD).

[15]  Dawei Qi,et al.  SemFix: Program repair via semantic analysis , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[16]  Yuhua Qi,et al.  The strength of random search on automated program repair , 2014, ICSE.

[17]  Yuriy Brun,et al.  Is the cure worse than the disease? overfitting in automated program repair , 2015, ESEC/SIGSOFT FSE.

[18]  Dawson R. Engler,et al.  KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[19]  Martin Monperrus,et al.  Nopol: Automatic Repair of Conditional Statement Bugs in Java Programs , 2018, IEEE Transactions on Software Engineering.

[20]  Claire Le Goues,et al.  A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[21]  Westley Weimer,et al.  A human study of patch maintainability , 2012, ISSTA 2012.

[22]  David Lo,et al.  Enhancing Automated Program Repair with Deductive Verification , 2016, 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[23]  Gregory Tassey,et al.  Prepared for what , 2007 .

[24]  Claire Le Goues,et al.  GenProg: A Generic Method for Automatic Software Repair , 2012, IEEE Transactions on Software Engineering.

[25]  David Lo,et al.  Active Semi-supervised Defect Categorization , 2015, 2015 IEEE 23rd International Conference on Program Comprehension.

[26]  David Lo,et al.  S3: syntax- and semantic-guided repair synthesis via programming by examples , 2017, ESEC/SIGSOFT FSE.

[27]  Sumit Gulwani,et al.  Oracle-guided component-based program synthesis , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[28]  Fan Long,et al.  Staged program repair with condition synthesis , 2015, ESEC/SIGSOFT FSE.

[29]  David Lo,et al.  Should fixing these failures be delegated to automated program repair? , 2015, 2015 IEEE 26th International Symposium on Software Reliability Engineering (ISSRE).

[30]  Martin Monperrus,et al.  Automatic repair of buggy if conditions and missing preconditions with SMT , 2014, CSTVA 2014.

[31]  Fan Long,et al.  Automatic patch generation by learning correct code , 2016, POPL.

[32]  David Lo,et al.  History Driven Program Repair , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[33]  Claire Le Goues,et al.  JFIX: semantics-based repair of Java programs via symbolic PathFinder , 2017, ISSTA.

[34]  Abhik Roychoudhury,et al.  Angelix: Scalable Multiline Program Patch Synthesis via Symbolic Analysis , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[35]  Xuan-Bach D. Le,et al.  Towards efficient and effective automatic program repair , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[36]  Abhik Roychoudhury,et al.  DirectFix: Looking for Simple Program Repairs , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[37]  Matias Martinez,et al.  Mining software repair models for reasoning on the search space of automated program fixing , 2013, Empirical Software Engineering.

[38]  Roderick Bloem,et al.  Repair with On-The-Fly Program Analysis , 2012, Haifa Verification Conference.

[39]  Sumit Gulwani,et al.  Programming by Examples - and its applications in Data Wrangling , 2016, Dependable Software Systems Engineering.

[40]  Yuriy Brun,et al.  The ManyBugs and IntroClass Benchmarks for Automated Repair of C Programs , 2015, IEEE Transactions on Software Engineering.

[41]  Kathryn T. Stolee,et al.  Repairing Programs with Semantic Code Search , 2015 .

[42]  Fan Long,et al.  An analysis of patch plausibility and correctness for generate-and-validate patch generation systems , 2015, ISSTA.