论文信息 - Automatic repair of real bugs in java: a large-scale experiment on the defects4j dataset

Automatic repair of real bugs in java: a large-scale experiment on the defects4j dataset

Defects4J is a large, peer-reviewed, structured dataset of real-world Java bugs. Each bug in Defects4J comes with a test suite and at least one failing test case that triggers the bug. In this paper, we report on an experiment to explore the effectiveness of automatic test-suite based repair on Defects4J. The result of our experiment shows that the considered state-of-the-art repair methods can generate patches for 47 out of 224 bugs. However, those patches are only test-suite adequate, which means that they pass the test suite and may potentially be incorrect beyond the test-suite satisfaction correctness criterion. We have manually analyzed 84 different patches to assess their real correctness. In total, 9 real Java bugs can be correctly repaired with test-suite based repair. This analysis shows that test-suite based repair suffers from under-specified bugs, for which trivial or incorrect patches still pass the test suite. With respect to practical applicability, it takes on average 14.8 minutes to find a patch. The experiment was done on a scientific grid, totaling 17.6 days of computation time. All the repair systems and experimental results are publicly available on Github in order to facilitate future research on automatic repair.

[1] Gary C. Brown. A cure worse than the disease? , 1996, America.

[2] John T. Stasko,et al. Visualization of test information to assist fault localization , 2002, ICSE '02.

[3] Gregg Rothermel,et al. Supporting Controlled Experimentation with Testing Techniques: An Infrastructure and its Potential Impact , 2005, Empirical Software Engineering.

[4] Yuanyuan Zhou,et al. BugBench: Benchmarks for Evaluating Bug Detection Tools , 2005 .

[5] Franck Cappello,et al. Grid'5000: A Large Scale And Highly Reconfigurable Experimental Grid Testbed , 2006, Int. J. High Perform. Comput. Appl..

[6] Thomas Zimmermann,et al. Extraction of bug localization benchmarks from history , 2007, ASE.

[7] A.J.C. van Gemund,et al. On the Accuracy of Spectrum-based Fault Localization , 2007, Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION (TAICPART-MUTATION 2007).

[8] Thomas J. Ostrand,et al. \{PROMISE\} Repository of empirical software engineering data , 2007 .

[9] Xin Yao,et al. A novel co-evolutionary approach to automatic software bug fixing , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[10] Erica Mealy,et al. BegBunch: benchmarking for C bug detection tools , 2009, DEFECTS '09.

[11] Zhendong Su,et al. Has the bug really been fixed? , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[12] Sumit Gulwani,et al. Oracle-guided component-based program synthesis , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[13] W. Eric Wong,et al. Using Mutation to Automatically Suggest Fixes for Faulty Programs , 2010, 2010 Third International Conference on Software Testing, Verification and Validation.

[14] Shan Lu,et al. Automated atomicity-violation fixing , 2011, PLDI '11.

[15] Westley Weimer,et al. A human study of patch maintainability , 2012, ISSTA 2012.

[16] Claire Le Goues,et al. GenProg: A Generic Method for Automatic Software Repair , 2012, IEEE Transactions on Software Engineering.

[17] Claire Le Goues,et al. A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[18] Frank Tip,et al. Automated repair of HTML generation errors in PHP applications using string constraint solving , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[19] Dawei Qi,et al. SemFix: Program repair via semantic analysis , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[20] Matias Martinez,et al. Mining software repair models for reasoning on the search space of automated program fixing , 2013, Empirical Software Engineering.

[21] Yuhua Qi,et al. Efficient Automated Program Repair through Fault-Recorded Testing Prioritization , 2013, 2013 IEEE International Conference on Software Maintenance.

[22] Jaechang Nam,et al. Automatic patch generation learned from human-written patches , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[23] Westley Weimer,et al. Leveraging program equivalence for adaptive program repair: Models and first results , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[24] Fan Long,et al. Automatic runtime error repair and containment via recovery shepherding , 2014, PLDI.

[25] Automated Fixing of Programs with Contracts , 2010, IEEE Transactions on Software Engineering.

[26] Matias Martinez,et al. Do the fix ingredients already exist? an empirical inquiry into the redundancy assumptions of program repair approaches , 2014, ICSE Companion.

[27] Michael D. Ernst,et al. Defects4J: a database of existing faults to enable controlled testing studies for Java programs , 2014, ISSTA 2014.

[28] Martin Monperrus,et al. A critical review of "automatic patch generation learned from human-written patches": essay on the problem statement and the evaluation of automatic software repair , 2014, ICSE.

[29] Yuhua Qi,et al. The strength of random search on automated program repair , 2014, ICSE.

[30] Sunghun Kim,et al. Automatically generated patches as debugging aids: a human study , 2014, SIGSOFT FSE.

[31] Martin Monperrus,et al. Test case purification for improving fault localization , 2014, SIGSOFT FSE.

[32] Sarfraz Khurshid,et al. Data-guided repair of selection statements , 2014, ICSE.

[33] Michael D. Ernst,et al. Are mutants a valid substitute for real faults in software testing? , 2014, SIGSOFT FSE.

[34] Martin Monperrus,et al. Automatic repair of buggy if conditions and missing preconditions with SMT , 2014, CSTVA 2014.

[35] Fan Long,et al. An analysis of patch plausibility and correctness for generate-and-validate patch generation systems , 2015, ISSTA.

[36] Bixin Li,et al. Experience report: How do techniques, programs, and tests impact automated program repair? , 2015, 2015 IEEE 26th International Symposium on Software Reliability Engineering (ISSRE).

[37] Zhendong Su,et al. An Empirical Study on Real Bug Fixes , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[38] Fan Long,et al. Staged program repair with condition synthesis , 2015, ESEC/SIGSOFT FSE.

[39] Abhik Roychoudhury,et al. DirectFix: Looking for Simple Program Repairs , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[40] Hadi Hemmati,et al. Test case analytics: Mining test case traces to improve risk-driven testing , 2015, 2015 IEEE 1st International Workshop on Software Analytics (SWAN).

[41] Yuriy Brun,et al. Is the cure worse than the disease? overfitting in automated program repair , 2015, ESEC/SIGSOFT FSE.

[42] Abhik Roychoudhury,et al. relifix: Automated Repair of Software Regressions , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[43] Yuriy Brun,et al. The ManyBugs and IntroClass Benchmarks for Automated Repair of C Programs , 2015, IEEE Transactions on Software Engineering.

[44] Fan Long,et al. An Analysis of the Search Spaces for Generate and Validate Patch Generation Systems , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[45] Fan Long,et al. Automatic patch generation by learning correct code , 2016, POPL.

[46] Matias Martinez,et al. ASTOR: a program repair library for Java (demo) , 2016, ISSTA.

[47] Martin Monperrus,et al. Nopol: Automatic Repair of Conditional Statement Bugs in Java Programs , 2018, IEEE Transactions on Software Engineering.