Automatic Repair of Real Bugs: An Experience Report on the Defects4J Dataset

Automatic software repair aims to reduce human effort for fixing bugs. Various automatic repair approaches have emerged in recent years. In this paper, we report on an experiment on automatically repairing 224 bugs of a real-world and publicly available bug dataset, Defects4J. We investigate the results of three repair methods, GenProg (repair via random search), Kali (repair via exhaustive search), and Nopol (repair via constraint based search). We conduct our investigation with five research questions: fixability, patch correctness, ill-defined bugs, performance, and fault localizability. Our implementations of GenProg, Kali, and Nopol fix together 41 out of 224 (18%) bugs with 59 different patches. This can be viewed as a baseline for future usage of Defects4J for automatic repair research. In addition, manual analysis of sampling 42 of 59 generated patches shows that only 8 patches are undoubtedly correct. This is a novel piece of evidence that there is large room for improvement in the area of test suite based repair.

[1]  Yuanyuan Zhang,et al.  Search-based software engineering: Trends, techniques and applications , 2012, CSUR.

[2]  Paulo César Masiero,et al.  Mutation testing applied to validate specifications based on statecharts , 1999, Proceedings 10th International Symposium on Software Reliability Engineering (Cat. No.PR00443).

[3]  W. Eric Wong,et al.  Using Mutation to Automatically Suggest Fixes for Faulty Programs , 2010, 2010 Third International Conference on Software Testing, Verification and Validation.

[4]  Erica Mealy,et al.  BegBunch: benchmarking for C bug detection tools , 2009, DEFECTS '09.

[5]  Yuhua Qi,et al.  The strength of random search on automated program repair , 2014, ICSE.

[6]  Yuriy Brun,et al.  The plastic surgery hypothesis , 2014, SIGSOFT FSE.

[7]  Abhik Roychoudhury,et al.  DirectFix: Looking for Simple Program Repairs , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[8]  John T. Stasko,et al.  Visualization of test information to assist fault localization , 2002, ICSE '02.

[9]  Matias Martinez,et al.  Mining software repair models for reasoning on the search space of automated program fixing , 2013, Empirical Software Engineering.

[10]  Fan Long,et al.  Automatic patch generation via learning from successful human patches , 2015 .

[11]  Gregg Rothermel,et al.  Supporting Controlled Experimentation with Testing Techniques: An Infrastructure and its Potential Impact , 2005, Empirical Software Engineering.

[12]  Xin Yao,et al.  A novel co-evolutionary approach to automatic software bug fixing , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[13]  Mary Jean Harrold,et al.  Empirical evaluation of the tarantula automatic fault-localization technique , 2005, ASE.

[14]  Thomas Zimmermann,et al.  Extraction of bug localization benchmarks from history , 2007, ASE.

[15]  Sumit Gulwani,et al.  Oracle-guided component-based program synthesis , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[16]  Franck Cappello,et al.  Grid'5000: A Large Scale And Highly Reconfigurable Experimental Grid Testbed , 2006, Int. J. High Perform. Comput. Appl..

[17]  Sarfraz Khurshid,et al.  Data-guided repair of selection statements , 2014, ICSE.

[18]  Yuriy Brun,et al.  Is the cure worse than the disease? overfitting in automated program repair , 2015, ESEC/SIGSOFT FSE.

[19]  Fan Long,et al.  Staged program repair with condition synthesis , 2015, ESEC/SIGSOFT FSE.

[20]  Yuanyuan Zhou,et al.  BugBench: Benchmarks for Evaluating Bug Detection Tools , 2005 .

[21]  Fan Long,et al.  Automatic runtime error repair and containment via recovery shepherding , 2014, PLDI.

[22]  Jaechang Nam,et al.  Automatic patch generation learned from human-written patches , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[23]  Yuriy Brun,et al.  The ManyBugs and IntroClass Benchmarks for Automated Repair of C Programs , 2015, IEEE Transactions on Software Engineering.

[24]  Martin Monperrus,et al.  Automatic repair of buggy if conditions and missing preconditions with SMT , 2014, CSTVA 2014.

[25]  Michael D. Ernst,et al.  Defects4J: a database of existing faults to enable controlled testing studies for Java programs , 2014, ISSTA 2014.

[26]  Frank Tip,et al.  Automated repair of HTML generation errors in PHP applications using string constraint solving , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[27]  Fan Long,et al.  An analysis of patch plausibility and correctness for generate-and-validate patch generation systems , 2015, ISSTA.

[28]  Martin Monperrus,et al.  A critical review of "automatic patch generation learned from human-written patches": essay on the problem statement and the evaluation of automatic software repair , 2014, ICSE.

[29]  Abhik Roychoudhury,et al.  relifix: Automated Repair of Software Regressions , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[30]  A.J.C. van Gemund,et al.  On the Accuracy of Spectrum-based Fault Localization , 2007, Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION (TAICPART-MUTATION 2007).

[31]  Emerson R. Murphy-Hill,et al.  The Design Space of Bug Fixes and How Developers Navigate It , 2015, IEEE Transactions on Software Engineering.

[32]  Matias Martinez,et al.  Do the fix ingredients already exist? an empirical inquiry into the redundancy assumptions of program repair approaches , 2014, ICSE Companion.

[33]  Michael D. Ernst,et al.  Are mutants a valid substitute for real faults in software testing? , 2014, SIGSOFT FSE.

[34]  Claire Le Goues,et al.  Current challenges in automatic software repair , 2013, Software Quality Journal.

[35]  Martin Monperrus,et al.  Test case purification for improving fault localization , 2014, SIGSOFT FSE.

[36]  Zhendong Su,et al.  Has the bug really been fixed? , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[37]  Gordon Fraser,et al.  Do Automatically Generated Unit Tests Find Real Faults? An Empirical Study of Effectiveness and Challenges (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[38]  Dawei Qi,et al.  SemFix: Program repair via semantic analysis , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[39]  Andreas Zeller,et al.  Lightweight bug localization with AMPLE , 2005, AADEBUG'05.

[40]  Claire Le Goues,et al.  A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[41]  Claire Le Goues,et al.  Representations and operators for improving evolutionary software repair , 2012, GECCO '12.

[42]  Shan Lu,et al.  Automated atomicity-violation fixing , 2011, PLDI '11.

[43]  Peter Zoeteweij,et al.  Spectrum-Based Multiple Fault Localization , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[44]  Matias Martinez,et al.  ASTOR: Evolutionary Automatic Software Repair for Java , 2014, ArXiv.

[45]  Claire Le Goues,et al.  GenProg: A Generic Method for Automatic Software Repair , 2012, IEEE Transactions on Software Engineering.

[46]  Yuhua Qi,et al.  Efficient Automated Program Repair through Fault-Recorded Testing Prioritization , 2013, 2013 IEEE International Conference on Software Maintenance.

[47]  Hadi Hemmati,et al.  Test case analytics: Mining test case traces to improve risk-driven testing , 2015, 2015 IEEE 1st International Workshop on Software Analytics (SWAN).

[48]  Shin Yoo,et al.  Evolving Human Competitive Spectra-Based Fault Localisation Techniques , 2012, SSBSE.

[49]  Westley Weimer,et al.  A human study of patch maintainability , 2012, ISSTA 2012.

[50]  Zhendong Su,et al.  An Empirical Study on Real Bug Fixes , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[51]  Sunghun Kim,et al.  Automatically generated patches as debugging aids: a human study , 2014, SIGSOFT FSE.