Review4Repair: Code Review Aided Automatic Program Repairing

Context: Learning-based automatic program repair techniques are showing promise to provide quality fix suggestions for detected bugs in the source code of the software. These tools mostly exploit historical data of buggy and fixed code changes and are heavily dependent on bug localizers while applying to a new piece of code. With the increasing popularity of code review, dependency on bug localizers can be reduced. Besides, the code review-based bug localization is more trustworthy since reviewers' expertise and experience are reflected in these suggestions. Objective: The natural language instructions scripted on the review comments are enormous sources of information about the bug's nature and expected solutions. However, none of the learning-based tools has utilized the review comments to fix programming bugs to the best of our knowledge. In this study, we investigate the performance improvement of repair techniques using code review comments. Method: We train a sequence-to-sequence model on 55,060 code reviews and associated code changes. We also introduce new tokenization and preprocessing approaches that help to achieve significant improvement over state-of-the-art learning-based repair techniques. Results: We boost the top-1 accuracy by 20.33% and top-10 accuracy by 34.82%. We could provide a suggestion for stylistics and non-code errors unaddressed by prior techniques. Conclusion: We believe that the automatic fix suggestions along with code review generated by our approach would help developers address the review comment quickly and correctly and thus save their time and effort.

[1]  Charles A. Sutton,et al.  Learning natural coding conventions , 2014, SIGSOFT FSE.

[2]  Abhik Roychoudhury,et al.  Angelix: Scalable Multiline Program Patch Synthesis via Symbolic Analysis , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[3]  Eric Lahtinen,et al.  Automatic error elimination by horizontal code transfer across multiple applications , 2015, PLDI.

[4]  W. Eric Wong,et al.  Using Mutation to Automatically Suggest Fixes for Faulty Programs , 2010, 2010 Third International Conference on Software Testing, Verification and Validation.

[5]  Alexander M. Rush,et al.  OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[6]  Hang Li,et al.  “ Tony ” DNN Embedding for “ Tony ” Selective Read for “ Tony ” ( a ) Attention-based Encoder-Decoder ( RNNSearch ) ( c ) State Update s 4 SourceVocabulary Softmax Prob , 2016 .

[7]  Andrea Janes,et al.  Big Code != Big Vocabulary: Open-Vocabulary Models for Source Code , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[8]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[9]  Premkumar T. Devanbu,et al.  Are deep neural networks the best choice for modeling source code? , 2017, ESEC/SIGSOFT FSE.

[10]  Claire Le Goues,et al.  Automated program repair , 2019, Commun. ACM.

[11]  Gabriele Bavota,et al.  An Empirical Study on Learning Bug-Fixing Patches in the Wild via Neural Machine Translation , 2018, ACM Trans. Softw. Eng. Methodol..

[12]  Gorjan Alagic,et al.  #p , 2019, Quantum information & computation.

[13]  Denys Poshyvanyk,et al.  SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair , 2018, IEEE Transactions on Software Engineering.

[14]  P. Alam ‘T’ , 2021, Composites Engineering: An A–Z Guide.

[15]  Claire Le Goues,et al.  A genetic programming approach to automated software repair , 2009, GECCO.

[16]  Chanchal Kumar Roy,et al.  Predicting Usefulness of Code Review Comments Using Textual Features and Developer Experience , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[17]  Andreas Zeller,et al.  Generating Fixes from Object Behavior Anomalies , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[18]  Yang Liu,et al.  Modeling Coverage for Neural Machine Translation , 2016, ACL.

[19]  Michael W. Godfrey,et al.  Code Review Quality: How Developers See It , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[20]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[21]  Le Song,et al.  Hoppity: Learning Graph Transformations to Detect and Fix Bugs in Programs , 2020, ICLR.

[22]  Emerson R. Murphy-Hill,et al.  The design of bug fixes , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[23]  Fan Long,et al.  Staged program repair with condition synthesis , 2015, ESEC/SIGSOFT FSE.

[24]  Jurgen J. Vinju,et al.  Towards a universal code formatter through machine learning , 2016, SLE.

[25]  Ben Shneiderman,et al.  Program indentation and comprehensibility , 1983, CACM.

[26]  Dawei Qi,et al.  SemFix: Program repair via semantic analysis , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[27]  Christian Kern Automatic Error Correction of Java Programs , 2010, AlgoSyn.

[28]  Armando Solar-Lezama,et al.  sk_p: a neural program corrector for MOOCs , 2016, SPLASH.

[29]  Yuriy Brun,et al.  Is the cure worse than the disease? overfitting in automated program repair , 2015, ESEC/SIGSOFT FSE.

[30]  P. Alam,et al.  R , 1823, The Herodotus Encyclopedia.

[31]  Alexander M. Rush,et al.  Bottom-Up Abstractive Summarization , 2018, EMNLP.

[32]  Charles A. Sutton,et al.  Suggesting accurate method and class names , 2015, ESEC/SIGSOFT FSE.

[33]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[34]  Anindya Iqbal,et al.  Understanding the motivations, challenges and needs of Blockchain software developers: a survey , 2018, Empirical Software Engineering.

[35]  Nicolas Anquetil,et al.  A study of the documentation essential to software maintenance , 2005, SIGDOC '05.

[36]  Claire Le Goues,et al.  Automatic program repair with evolutionary computation , 2010, Commun. ACM.

[37]  Cícero Nogueira dos Santos,et al.  Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts , 2014, COLING.

[38]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[39]  Daniela Micucci,et al.  Automatic Software Repair: A Survey , 2019, IEEE Transactions on Software Engineering.

[40]  Chen Liu,et al.  R2Fix: Automatically Generating Bug Fixes from Bug Reports , 2013, 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation.

[41]  P. Alam ‘K’ , 2021, Composites Engineering.

[42]  Martin Monperrus,et al.  Automatic repair of buggy if conditions and missing preconditions with SMT , 2014, CSTVA 2014.

[43]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[44]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[45]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[46]  Michael Fagan Design and Code Inspections to Reduce Errors in Program Development , 1976, IBM Syst. J..

[47]  Claire Le Goues,et al.  Automatically finding patches using genetic programming , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[48]  Yuhua Qi,et al.  The strength of random search on automated program repair , 2014, ICSE.

[49]  장윤희,et al.  Y. , 2003, Industrial and Labor Relations Terms.

[50]  Thibaud Lutellier,et al.  ENCORE: Ensemble Learning using Convolution Neural Machine Translation for Automatic Program Repair , 2019, ArXiv.

[51]  Frank Tip,et al.  Automated repair of HTML generation errors in PHP applications using string constraint solving , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[52]  Jeffrey C. Carver,et al.  Process Aspects and Social Dynamics of Contemporary Code Review: Insights from Open Source Development and Industrial Practice at Microsoft , 2017, IEEE Transactions on Software Engineering.

[53]  Matias Martinez,et al.  Human-competitive Patches in Automatic Program Repair with Repairnator , 2018, ArXiv.

[54]  Alexander M. Rush,et al.  Optimal Beam Search for Machine Translation , 2013, EMNLP.

[55]  Fan Long,et al.  An analysis of patch plausibility and correctness for generate-and-validate patch generation systems , 2015, ISSTA.

[56]  Jacob Cohen,et al.  Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .

[57]  Baishakhi Ray,et al.  CODIT: Code Editing with Tree-Based NeuralMachine Translation , 2018 .

[58]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[59]  Jef Raskin Comments are More Important than Code , 2005, ACM Queue.

[60]  Andrea Arcuri,et al.  Automatic software generation and improvement through search based techniques , 2009 .

[61]  Roderick Bloem,et al.  Repair with On-The-Fly Program Analysis , 2012, Haifa Verification Conference.

[62]  Bradley Alexander,et al.  Evolving patches for software repair , 2011, GECCO '11.

[63]  Shaohua Wang,et al.  DLFix: Context-based Code Transformation Learning for Automated Program Repair , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[64]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[65]  Johannes Bader,et al.  Getafix: learning to fix bugs automatically , 2019, Proc. ACM Program. Lang..

[66]  Roderick Bloem,et al.  Automated error localization and correction for imperative programs , 2011, 2011 Formal Methods in Computer-Aided Design (FMCAD).

[67]  Gabriele Bavota,et al.  On Learning Meaningful Code Changes Via Neural Machine Translation , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[68]  Graham Neubig,et al.  Learning to Generate Corrective Patches using Neural Machine Translation , 2018, ArXiv.

[69]  Christian Bird,et al.  Characteristics of Useful Code Reviews: An Empirical Study at Microsoft , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[70]  Sumit Gulwani,et al.  Automated feedback generation for introductory programming assignments , 2013, PLDI.

[71]  Jaechang Nam,et al.  Automatic patch generation learned from human-written patches , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[72]  Richard Socher,et al.  A Deep Reinforced Model for Abstractive Summarization , 2017, ICLR.