Automatic repair of real bugs in java: a large-scale experiment on the defects4j dataset

Defects4J is a large, peer-reviewed, structured dataset of real-world Java bugs. Each bug in Defects4J comes with a test suite and at least one failing test case that triggers the bug. In this paper, we report on an experiment to explore the effectiveness of automatic test-suite based repair on Defects4J. The result of our experiment shows that the considered state-of-the-art repair methods can generate patches for 47 out of 224 bugs. However, those patches are only test-suite adequate, which means that they pass the test suite and may potentially be incorrect beyond the test-suite satisfaction correctness criterion. We have manually analyzed 84 different patches to assess their real correctness. In total, 9 real Java bugs can be correctly repaired with test-suite based repair. This analysis shows that test-suite based repair suffers from under-specified bugs, for which trivial or incorrect patches still pass the test suite. With respect to practical applicability, it takes on average 14.8 minutes to find a patch. The experiment was done on a scientific grid, totaling 17.6 days of computation time. All the repair systems and experimental results are publicly available on Github in order to facilitate future research on automatic repair.

[1]  Gary C. Brown A cure worse than the disease? , 1996, America.

[2]  John T. Stasko,et al.  Visualization of test information to assist fault localization , 2002, ICSE '02.

[3]  Gregg Rothermel,et al.  Supporting Controlled Experimentation with Testing Techniques: An Infrastructure and its Potential Impact , 2005, Empirical Software Engineering.

[4]  Yuanyuan Zhou,et al.  BugBench: Benchmarks for Evaluating Bug Detection Tools , 2005 .

[5]  Franck Cappello,et al.  Grid'5000: A Large Scale And Highly Reconfigurable Experimental Grid Testbed , 2006, Int. J. High Perform. Comput. Appl..

[6]  Thomas Zimmermann,et al.  Extraction of bug localization benchmarks from history , 2007, ASE.

[7]  A.J.C. van Gemund,et al.  On the Accuracy of Spectrum-based Fault Localization , 2007, Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION (TAICPART-MUTATION 2007).

[8]  Thomas J. Ostrand,et al.  \{PROMISE\} Repository of empirical software engineering data , 2007 .

[9]  Xin Yao,et al.  A novel co-evolutionary approach to automatic software bug fixing , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[10]  Erica Mealy,et al.  BegBunch: benchmarking for C bug detection tools , 2009, DEFECTS '09.

[11]  Zhendong Su,et al.  Has the bug really been fixed? , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[12]  Sumit Gulwani,et al.  Oracle-guided component-based program synthesis , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[13]  W. Eric Wong,et al.  Using Mutation to Automatically Suggest Fixes for Faulty Programs , 2010, 2010 Third International Conference on Software Testing, Verification and Validation.

[14]  Shan Lu,et al.  Automated atomicity-violation fixing , 2011, PLDI '11.

[15]  Westley Weimer,et al.  A human study of patch maintainability , 2012, ISSTA 2012.

[16]  Claire Le Goues,et al.  GenProg: A Generic Method for Automatic Software Repair , 2012, IEEE Transactions on Software Engineering.

[17]  Claire Le Goues,et al.  A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[18]  Frank Tip,et al.  Automated repair of HTML generation errors in PHP applications using string constraint solving , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[19]  Dawei Qi,et al.  SemFix: Program repair via semantic analysis , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[20]  Matias Martinez,et al.  Mining software repair models for reasoning on the search space of automated program fixing , 2013, Empirical Software Engineering.

[21]  Yuhua Qi,et al.  Efficient Automated Program Repair through Fault-Recorded Testing Prioritization , 2013, 2013 IEEE International Conference on Software Maintenance.

[22]  Jaechang Nam,et al.  Automatic patch generation learned from human-written patches , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[23]  Westley Weimer,et al.  Leveraging program equivalence for adaptive program repair: Models and first results , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[24]  Fan Long,et al.  Automatic runtime error repair and containment via recovery shepherding , 2014, PLDI.

[25]  Automated Fixing of Programs with Contracts , 2010, IEEE Transactions on Software Engineering.

[26]  Matias Martinez,et al.  Do the fix ingredients already exist? an empirical inquiry into the redundancy assumptions of program repair approaches , 2014, ICSE Companion.

[27]  Michael D. Ernst,et al.  Defects4J: a database of existing faults to enable controlled testing studies for Java programs , 2014, ISSTA 2014.

[28]  Martin Monperrus,et al.  A critical review of "automatic patch generation learned from human-written patches": essay on the problem statement and the evaluation of automatic software repair , 2014, ICSE.

[29]  Yuhua Qi,et al.  The strength of random search on automated program repair , 2014, ICSE.

[30]  Sunghun Kim,et al.  Automatically generated patches as debugging aids: a human study , 2014, SIGSOFT FSE.

[31]  Martin Monperrus,et al.  Test case purification for improving fault localization , 2014, SIGSOFT FSE.

[32]  Sarfraz Khurshid,et al.  Data-guided repair of selection statements , 2014, ICSE.

[33]  Michael D. Ernst,et al.  Are mutants a valid substitute for real faults in software testing? , 2014, SIGSOFT FSE.

[34]  Martin Monperrus,et al.  Automatic repair of buggy if conditions and missing preconditions with SMT , 2014, CSTVA 2014.

[35]  Fan Long,et al.  An analysis of patch plausibility and correctness for generate-and-validate patch generation systems , 2015, ISSTA.

[36]  Bixin Li,et al.  Experience report: How do techniques, programs, and tests impact automated program repair? , 2015, 2015 IEEE 26th International Symposium on Software Reliability Engineering (ISSRE).

[37]  Zhendong Su,et al.  An Empirical Study on Real Bug Fixes , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[38]  Fan Long,et al.  Staged program repair with condition synthesis , 2015, ESEC/SIGSOFT FSE.

[39]  Abhik Roychoudhury,et al.  DirectFix: Looking for Simple Program Repairs , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[40]  Hadi Hemmati,et al.  Test case analytics: Mining test case traces to improve risk-driven testing , 2015, 2015 IEEE 1st International Workshop on Software Analytics (SWAN).

[41]  Yuriy Brun,et al.  Is the cure worse than the disease? overfitting in automated program repair , 2015, ESEC/SIGSOFT FSE.

[42]  Abhik Roychoudhury,et al.  relifix: Automated Repair of Software Regressions , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[43]  Yuriy Brun,et al.  The ManyBugs and IntroClass Benchmarks for Automated Repair of C Programs , 2015, IEEE Transactions on Software Engineering.

[44]  Fan Long,et al.  An Analysis of the Search Spaces for Generate and Validate Patch Generation Systems , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[45]  Fan Long,et al.  Automatic patch generation by learning correct code , 2016, POPL.

[46]  Matias Martinez,et al.  ASTOR: a program repair library for Java (demo) , 2016, ISSTA.

[47]  Martin Monperrus,et al.  Nopol: Automatic Repair of Conditional Statement Bugs in Java Programs , 2018, IEEE Transactions on Software Engineering.