Identifying Patch Correctness in Test-Based Program Repair

Test-based automatic program repair has attracted a lot of attention in recent years However, the test suites in practice are often too weak to guarantee correctness and existing approaches often generate a large number of incorrect patches. To reduce the number of incorrect patches generated, we propose a novel approach that heuristically determines the correctness of the generated patches. The core idea is to exploit the behavior similarity of test case executions. The passing tests on original and patched programs are likely to behave similarly while the failing tests on original and patched programs are likely to behave differently. Also, if two tests exhibit similar runtime behavior, the two tests are likely to have the same test results. Based on these observations, we generate new test inputs to enhance the test suites and use their behavior similarity to determine patch correctness. Our approach is evaluated on a dataset consisting of 139 patches generated from existing program repair systems including jGenProg, Nopol, jKali, ACS, and HDRepair. Our approach successfully prevented 56.3% of the incorrect patches to be generated, without blocking any correct patches.

[1]  Martin Monperrus,et al.  Nopol: Automatic Repair of Conditional Statement Bugs in Java Programs , 2018, IEEE Transactions on Software Engineering.

[2]  Hao Zhong,et al.  Mining stackoverflow for program repair , 2018, 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[3]  Alexey Zhikhartsev,et al.  Better test cases for better automated program repair , 2017, ESEC/SIGSOFT FSE.

[4]  David Lo,et al.  S3: syntax- and semantic-guided repair synthesis via programming by examples , 2017, ESEC/SIGSOFT FSE.

[5]  Matias Martinez,et al.  Automatic repair of real bugs in java: a large-scale experiment on the defects4j dataset , 2016, Empirical Software Engineering.

[6]  Qi Xin,et al.  Identifying test-suite-overfitted patches through test case generation , 2017, ISSTA.

[7]  Matias Martinez,et al.  Test Case Generation for Program Repair: A Study of Feasibility and Effectiveness , 2017, ArXiv.

[8]  Rahul Gupta,et al.  DeepFix: Fixing Common C Language Errors by Deep Learning , 2017, AAAI.

[9]  Hiroaki Yoshida,et al.  Anti-patterns in search-based program repair , 2016, SIGSOFT FSE.

[10]  Jiachen Zhang,et al.  Precise Condition Synthesis for Program Repair , 2016, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[11]  Matias Martinez,et al.  ASTOR: a program repair library for Java (demo) , 2016, ISSTA.

[12]  Rishabh Singh,et al.  Qlose: Program Repair with Quantitative Objectives , 2016, CAV.

[13]  Miryung Kim,et al.  Trusted Software Repair for System Resiliency , 2016, 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshop (DSN-W).

[14]  Abhik Roychoudhury,et al.  Angelix: Scalable Multiline Program Patch Synthesis via Symbolic Analysis , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[15]  David Lo,et al.  History Driven Program Repair , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[16]  Fan Long,et al.  An Analysis of the Search Spaces for Generate and Validate Patch Generation Systems , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[17]  Fan Long,et al.  Automatic patch generation by learning correct code , 2016, POPL.

[18]  Hiroyuki Sato,et al.  GRT: Program-Analysis-Guided Random Testing (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[19]  Yuriy Brun,et al.  Repairing Programs with Semantic Code Search (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[20]  Jie Wang,et al.  Fixing Recurring Crash Bugs via Analyzing Q&A Sites (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[21]  Yuriy Brun,et al.  Is the cure worse than the disease? overfitting in automated program repair , 2015, ESEC/SIGSOFT FSE.

[22]  Fan Long,et al.  Staged program repair with condition synthesis , 2015, ESEC/SIGSOFT FSE.

[23]  Krzysztof Czarnecki,et al.  Range Fixes: Interactive Error Resolution for Software Configuration , 2015, IEEE Transactions on Software Engineering.

[24]  Fan Long,et al.  An analysis of patch plausibility and correctness for generate-and-validate patch generation systems , 2015, ISSTA.

[25]  Abhik Roychoudhury,et al.  DirectFix: Looking for Simple Program Repairs , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[26]  Lu Zhang,et al.  Safe Memory-Leak Fixing for C Programs , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[27]  Mark Harman,et al.  The Oracle Problem in Software Testing: A Survey , 2015, IEEE Transactions on Software Engineering.

[28]  Sunghun Kim,et al.  Automatically generated patches as debugging aids: a human study , 2014, SIGSOFT FSE.

[29]  Michael D. Ernst,et al.  Defects4J: a database of existing faults to enable controlled testing studies for Java programs , 2014, ISSTA 2014.

[30]  Yuhua Qi,et al.  The strength of random search on automated program repair , 2014, ICSE.

[31]  Shin Yoo,et al.  Ask the Mutants: Mutating Faulty Programs for Fault Localization , 2014, 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation.

[32]  Cristian Cadar,et al.  KATCH: high-coverage testing of software patches , 2013, ESEC/FSE 2013.

[33]  E. Allen Emerson,et al.  Cost-Aware Automatic Program Repair , 2013, SAS.

[34]  Christian von Essen,et al.  Program repair without regret , 2013, Formal Methods Syst. Des..

[35]  Jaechang Nam,et al.  Automatic patch generation learned from human-written patches , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[36]  Dawei Qi,et al.  SemFix: Program repair via semantic analysis , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[37]  Claire Le Goues,et al.  A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[38]  Yves Le Traon,et al.  Using Mutants to Locate "Unknown" Faults , 2012, 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation.

[39]  Sumit Gulwani,et al.  Automated feedback generation for introductory programming assignments , 2012, PLDI.

[40]  Roderick Bloem,et al.  Automated error localization and correction for imperative programs , 2011, 2011 Formal Methods in Computer-Aided Design (FMCAD).

[41]  Gordon Fraser,et al.  EvoSuite: automatic test suite generation for object-oriented software , 2011, ESEC/FSE '11.

[42]  Corina S. Pasareanu,et al.  Symbolic PathFinder: symbolic execution of Java bytecode , 2010, ASE.

[43]  Andreas Zeller,et al.  Mutation-Driven Generation of Unit Tests and Oracles , 2010, IEEE Transactions on Software Engineering.

[44]  Claire Le Goues,et al.  Automatically finding patches using genetic programming , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[45]  Stephen McCamant,et al.  The Daikon system for dynamic detection of likely invariants , 2007, Sci. Comput. Program..

[46]  Michael D. Ernst,et al.  Randoop: feedback-directed random testing for Java , 2007, OOPSLA '07.

[47]  Michael D. Ernst,et al.  Feedback-Directed Random Test Generation , 2007, 29th International Conference on Software Engineering (ICSE'07).

[48]  Alexander G. Gray,et al.  On-line anomaly detection of deployed software: a statistical machine learning approach , 2006, SOQUA '06.

[49]  Westley Weimer,et al.  Patches as better bug reports , 2006, GPCE '06.

[50]  Alessandro Orso,et al.  Applying classification techniques to remotely-collected program execution data , 2005, ESEC/FSE-13.

[51]  James A. Jones,et al.  Visualization of test information to assist fault localization , 2002, Proceedings of the 24th International Conference on Software Engineering. ICSE 2002.

[52]  David Leon,et al.  Pursuing failure: the distribution of program failures in a profile space , 2001, ESEC/FSE-9.

[53]  William G. Griswold,et al.  Dynamically discovering likely program invariants to support program evolution , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[54]  Gregg Rothermel,et al.  An empirical investigation of program spectra , 1998, PASTE '98.

[55]  James R. Larus,et al.  The use of program profiling for software maintenance with applications to the year 2000 problem , 1997, ESEC '97/FSE-5.

[56]  Matias Martinez ASTOR: A Program Repair Library for Java , 2016 .

[57]  Cheng Zhang,et al.  Automated Test Oracles: A Survey , 2015, Adv. Comput..

[58]  Kathryn T. Stolee,et al.  Repairing Programs with Semantic Code Search , 2015 .

[59]  Claire Le Goues,et al.  GenProg: A Generic Method for Automatic Software Repair , 2012, IEEE Transactions on Software Engineering.