Precfix: Large-Scale Patch Recommendation by Mining Defect-Patch Pairs

Patch recommendation is the process of identifying errors in software systems and suggesting suitable fixes for them. Patch recommendation can significantly improve developer productivity by reducing both the debugging and repairing time. Existing techniques usually rely on complete test suites and detailed debugging reports, which are often absent in practical industrial settings. In this paper, we propose Precfix, a pragmatic approach targeting large-scale industrial codebase and making recommendations based on previously observed debugging activities. Precfix collects defect-patch pairs from development histories, performs clustering, and extracts generic reusable patching patterns as recommendations. We conducted experimental study on an industrial codebase with 10K projects involving diverse defect patterns. We managed to extract 3K templates of defect-patch pairs, which have been successfully applied to the entire codebase. Our approach is able to make recommendations within milliseconds and achieves a false positive rate of 22% confirmed by manual review. The majority (10/12) of the interviewed developers appreciated Precfix, which has been rolled out to Alibaba to support various critical businesses.

[1]  Alessandro Orso,et al.  MintHint: automated synthesis of repair hints , 2013, ICSE.

[2]  Abhik Roychoudhury,et al.  A correlation study between automated program repair and test-suite metrics , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[3]  Charles Zhang,et al.  Grail: context-aware fixing of concurrency bugs , 2014, SIGSOFT FSE.

[4]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[5]  Denys Poshyvanyk,et al.  SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair , 2018, IEEE Transactions on Software Engineering.

[6]  Steven P. Reiss,et al.  Fault localization with nearest neighbor queries , 2003, 18th IEEE International Conference on Automated Software Engineering, 2003. Proceedings..

[7]  Andreas Zeller,et al.  Generating Fixes from Object Behavior Anomalies , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[8]  Michael D. Ernst,et al.  Defects4J: a database of existing faults to enable controlled testing studies for Java programs , 2014, ISSTA 2014.

[9]  Frank Tip,et al.  Automated repair of HTML generation errors in PHP applications using string constraint solving , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[10]  Andreas Zeller,et al.  Predicting faults from cached history , 2008, ISEC '08.

[11]  Sunghun Kim,et al.  Automatically generated patches as debugging aids: a human study , 2014, SIGSOFT FSE.

[12]  J. Rubin,et al.  Semantic Slicing of Software Version Histories , 2018, IEEE Transactions on Software Engineering.

[13]  Jaechang Nam,et al.  Automatic patch generation learned from human-written patches , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[14]  Monperrus Martin Automatic Software Repair: a Bibliography , 2020 .

[15]  A.J.C. van Gemund,et al.  On the Accuracy of Spectrum-based Fault Localization , 2007, Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION (TAICPART-MUTATION 2007).

[16]  Daniel S. Hirschberg,et al.  Algorithms for the Longest Common Subsequence Problem , 1977, JACM.

[17]  Shan Lu,et al.  Understanding and generating high quality patches for concurrency bugs , 2016, SIGSOFT FSE.

[18]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[19]  Dawei Qi,et al.  SemFix: Program repair via semantic analysis , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[20]  Marsha Chechik,et al.  Precise semantic history slicing through dynamic delta refinement , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[21]  Marsha Chechik,et al.  FHistorian: Locating Features in Version Histories , 2017, SPLC.

[22]  Xiangyu Zhang,et al.  Locating faults through automated predicate switching , 2006, ICSE.

[23]  Hongyu Zhang,et al.  Shaping program repair space with existing patches and similar code , 2018, ISSTA.

[24]  Rajiv Gupta,et al.  BugFix: A learning-based tool to assist developers in fixing bugs , 2009, 2009 IEEE 17th International Conference on Program Comprehension.

[25]  Jian Li,et al.  Software Defect Prediction via Convolutional Neural Network , 2017, 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS).

[26]  Fan Long,et al.  Automatic patch generation by learning correct code , 2016, POPL.

[27]  Matias Martinez,et al.  A Comprehensive Study of Automatic Program Repair on the QuixBugs Benchmark , 2018, 2019 IEEE 1st International Workshop on Intelligent Bug Fixing (IBF).

[28]  David Lo,et al.  Search-based fault localization , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[29]  Jacques Klein,et al.  iFixR: bug report driven program repair , 2019, ESEC/SIGSOFT FSE.

[30]  Claire Le Goues,et al.  Automatically finding patches using genetic programming , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[31]  Mary Jean Harrold,et al.  Empirical evaluation of the tarantula automatic fault-localization technique , 2005, ASE.

[32]  Helmut Neukirchen Survey and Performance Evaluation of DBSCAN Spatial Clustering Implementations for Big Data and High-Performance Computing Paradigms , 2016 .

[33]  Johannes Bader,et al.  Getafix: learning to fix bugs automatically , 2019, Proc. ACM Program. Lang..

[34]  Abdelwahab Hamou-Lhadj,et al.  CLEVER: Combining Code Metrics with Clone Detection for Just-in-Time Fault Prevention and Resolution in Large Industrial Projects , 2018, 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR).

[35]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[36]  Michael D. Ernst,et al.  An Empirical Study of Fault Localization Families and Their Combinations , 2018, IEEE Transactions on Software Engineering.

[37]  Joseph Robert Horgan,et al.  Fault localization using execution slices and dataflow tests , 1995, Proceedings of Sixth International Symposium on Software Reliability Engineering. ISSRE'95.

[38]  Guido Schryen,et al.  A Comprehensive and Comparative Analysis of the Patching Behavior of Open Source and Closed Source Software Vendors , 2009, 2009 Fifth International Conference on IT Security Incident Management and IT Forensics.

[39]  Rui Abreu,et al.  A Survey on Software Fault Localization , 2016, IEEE Transactions on Software Engineering.

[40]  Michael D. Ernst,et al.  Evaluating and Improving Fault Localization , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[41]  Andreas Zeller,et al.  When do changes induce fixes? , 2005, ACM SIGSOFT Softw. Eng. Notes.

[42]  Shin Yoo,et al.  Ask the Mutants: Mutating Faulty Programs for Fault Localization , 2014, 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation.

[43]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[44]  S. Niwattanakul,et al.  Using of Jaccard Coefficient for Keywords Similarity , 2022 .

[45]  Jian Zhou,et al.  Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[46]  Yuriy Brun,et al.  Repairing Programs with Semantic Code Search (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[47]  Daniela Micucci,et al.  Automatic Software Repair: A Survey , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[48]  Yves Le Traon,et al.  Metallaxis‐FL: mutation‐based fault localization , 2015, Softw. Test. Verification Reliab..

[49]  Premkumar T. Devanbu,et al.  BugCache for inspections: hit or miss? , 2011, ESEC/FSE '11.

[50]  Lu Zhang,et al.  Safe Memory-Leak Fixing for C Programs , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[51]  Lu Zhang,et al.  Boosting Bug-Report-Oriented Fault Localization with Segmentation and Stack-Trace Analysis , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[52]  Kathryn T. Stolee,et al.  Repairing Programs with Semantic Code Search , 2015 .

[53]  Fan Long,et al.  An analysis of patch plausibility and correctness for generate-and-validate patch generation systems , 2015, ISSTA.

[54]  Gurmeet Singh Manku,et al.  Detecting near-duplicates for web crawling , 2007, WWW '07.

[55]  Elizabeth Chang,et al.  Open Source and Closed Source Software Development Methodologies , 2004, ICSE 2004.

[56]  Rongxin Wu,et al.  CrashLocator: locating crashing faults based on crash stacks , 2014, ISSTA 2014.