Understanding Automatically-Generated Patches Through Symbolic Invariant Differences

Developer trust is a major barrier to the deployment of automatically-generated patches. Understanding the effect of a patch is a key element of that trust. We find that differences in sets of formal invariants characterize patch differences and that implication-based distances in invariant space characterize patch similarities. When one patch is similar to another it often contains the same changes as well as additional behavior; this pattern is well-captured by logical implication. We can measure differences using a theorem prover to verify implications between invariants implied by separate programs. Although effective, theorem provers are computationally intensive; we find that string distance is an efficient heuristic for implication-based distance measurements. We propose to use distances between patches to construct a hierarchy highlighting patch similarities. We evaluated this approach on over 300 patches and found that it correctly categorizes programs into semantically similar clusters. Clustering programs reduces human effort by reducing the number of semantically distinct patches that must be considered by over 50%, thus reducing the time required to establish trust in automatically generated repairs.

[1]  Matias Martinez,et al.  ASTOR: a program repair library for Java (demo) , 2016, ISSTA.

[2]  Michael D. Ernst,et al.  Defects4J: a database of existing faults to enable controlled testing studies for Java programs , 2014, ISSTA 2014.

[3]  Matias Martinez ASTOR: A Program Repair Library for Java , 2016 .

[4]  Sarfraz Khurshid,et al.  Specification-Based Program Repair Using SAT , 2011, TACAS.

[5]  Fan Long,et al.  Automatic patch generation by learning correct code , 2016, POPL.

[6]  Mark Harman,et al.  SapFix: Automated End-to-End Repair at Scale , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP).

[7]  Jaechang Nam,et al.  Automatic patch generation learned from human-written patches , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[8]  Martin Monperrus,et al.  Automatic Software Repair , 2018, ACM Comput. Surv..

[9]  Westley Weimer,et al.  A human study of patch maintainability , 2012, ISSTA 2012.

[10]  Shlomo Moran,et al.  Optimal implementations of UPGMA and other common clustering algorithms , 2007, Inf. Process. Lett..

[11]  Sriram K. Rajamani,et al.  Automatically validating temporal safety properties of interfaces , 2001, SPIN '01.

[12]  Joseph B. Lyons,et al.  A Descriptive Model of Computer Code Trustworthiness , 2017 .

[13]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[14]  Nikolaj Bjørner,et al.  Z3: An Efficient SMT Solver , 2008, TACAS.

[15]  Dawei Qi,et al.  SemFix: Program repair via semantic analysis , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[16]  Claire Le Goues,et al.  GenProg: A Generic Method for Automatic Software Repair , 2012, IEEE Transactions on Software Engineering.

[17]  Yuriy Brun,et al.  Is the cure worse than the disease? overfitting in automated program repair , 2015, ESEC/SIGSOFT FSE.

[18]  John R. Woodward,et al.  Fixing bugs in your sleep: how genetic improvement became an overnight success , 2017, GECCO.

[19]  Claire Le Goues,et al.  Leveraging Program Invariants to Promote Population Diversity in Search-Based Automatic Program Repair , 2019, 2019 IEEE/ACM International Workshop on Genetic Improvement (GI).

[20]  Wolfgang Banzhaf,et al.  ARJA: Automated Repair of Java Programs via Multi-Objective Genetic Programming , 2017, IEEE Transactions on Software Engineering.

[21]  Stephen McCamant,et al.  The Daikon system for dynamic detection of likely invariants , 2007, Sci. Comput. Program..

[22]  Westley Weimer,et al.  Formal Verification by Reverse Synthesis , 2008, SAFECOMP.

[23]  Yuriy Brun,et al.  The ManyBugs and IntroClass Benchmarks for Automated Repair of C Programs , 2015, IEEE Transactions on Software Engineering.

[24]  Fan Long,et al.  An analysis of patch plausibility and correctness for generate-and-validate patch generation systems , 2015, ISSTA.