Jointly Learning to Repair Code and Generate Commit Message

We propose a novel task of jointly repairing program codes and generating commit messages. Code repair and commit message generation are two essential and related tasks for software development. However, existing work usually performs the two tasks independently. We construct a multilingual triple dataset including buggy code, fixed code, and commit messages for this novel task. We provide the cascaded models as baseline, which are enhanced with different training approaches, including the teacher-student method, the multi-task method, and the backtranslation method. To deal with the error propagation problem of the cascaded method, the joint model is proposed that can both repair the code and generate the commit message in a unified framework. Experimental results show that the enhanced cascaded model with teacher-student method and multitask-learning method achieves the best score on different metrics of automated code repair, and the joint model behaves better than the cascaded model on commit message generation.

[1]  Yang Liu,et al.  A Teacher-Student Framework for Zero-Resource Neural Machine Translation , 2017, ACL.

[2]  Dawn Xiaodong Song,et al.  Tree-to-tree Neural Networks for Program Translation , 2018, NeurIPS.

[3]  Aditya Kanade,et al.  Neural Program Repair by Jointly Learning to Localize and Repair , 2019, ICLR.

[4]  Alon Lavie,et al.  Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.

[5]  Tara N. Sainath,et al.  FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[6]  Ming Wen,et al.  Context-Aware Patch Generation for Better Automated Program Repair , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Denys Poshyvanyk,et al.  SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair , 2018, IEEE Transactions on Software Engineering.

[9]  Yang Liu,et al.  ATOM: Commit Message Generation Based on Abstract Syntax Tree and Hybrid Ranking , 2019, ArXiv.

[10]  Guillaume Lample,et al.  Unsupervised Translation of Programming Languages , 2020, NeurIPS.

[11]  Furu Wei,et al.  Grammar-Based Patches Generation for Automated Program Repair , 2021, FINDINGS.

[12]  Yutaka Matsuo,et al.  A Neural Architecture for Generating Natural Language Descriptions from Source Code Changes , 2017, ACL.

[13]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[14]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[15]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[16]  Dawei Qi,et al.  SemFix: Program repair via semantic analysis , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[17]  Mario Linares Vásquez,et al.  On Automatically Generating Commit Messages via Summarization of Source Code Changes , 2014, 2014 IEEE 14th International Working Conference on Source Code Analysis and Manipulation.

[18]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[19]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[20]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[21]  Andreas Zeller,et al.  How Long Will It Take to Fix This Bug? , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[22]  Neel Sundaresan,et al.  CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation , 2021, NeurIPS Datasets and Benchmarks.

[23]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[24]  Ming Zhou,et al.  Reaching Human-level Performance in Automatic Grammatical Error Correction: An Empirical Study , 2018, ArXiv.

[25]  Trevor Darrell,et al.  Robust Change Captioning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Abhik Roychoudhury,et al.  Angelix: Scalable Multiline Program Patch Synthesis via Symbolic Analysis , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[27]  Gabriele Bavota,et al.  Learning How to Mutate Source Code from Bug-Fixes , 2018, 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[28]  Neel Sundaresan,et al.  PyMT5: Multi-mode Translation of Natural Language and Python Code with Transformers , 2020, EMNLP.

[29]  Harald C. Gall,et al.  Populating a Release History Database from version control and bug tracking systems , 2003, International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..

[30]  Xiaonan Luo,et al.  Mining Version Control System for Automatically Generating Commit Comment , 2017, 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).

[31]  Shan Lu,et al.  Automated atomicity-violation fixing , 2011, PLDI '11.

[32]  Feng Xu,et al.  Commit Message Generation for Source Code Changes , 2019, IJCAI.

[33]  Yue Zhang,et al.  End-to-End Neural Relation Extraction with Global Optimization , 2017, EMNLP.

[34]  Neel Sundaresan,et al.  IntelliCode compose: code generation using transformer , 2020, ESEC/SIGSOFT FSE.

[35]  Gabriele Bavota,et al.  An Empirical Investigation into Learning Bug-Fixing Patches in the Wild via Neural Machine Translation , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[36]  Collin McMillan,et al.  Automatically generating commit messages from diffs using neural machine translation , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[37]  Percy Liang,et al.  Graph-based, Self-Supervised Program Repair from Diagnostic Feedback , 2020, ICML.

[38]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[39]  David Lo,et al.  Deep Code Comment Generation , 2018, 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC).

[40]  Ming Zhou,et al.  GraphCodeBERT: Pre-training Code Representations with Data Flow , 2020, ICLR.

[41]  Xiaocheng Feng,et al.  CodeBERT: A Pre-Trained Model for Programming and Natural Languages , 2020, EMNLP.

[42]  Xiaodong Gu,et al.  Deep Code Search , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[43]  Zijian Li,et al.  TAG : Type Auxiliary Guiding for Code Comment Generation , 2020, ACL.

[44]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[45]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[46]  Jing Wang,et al.  Fault Localization Analysis Based on Deep Neural Network , 2016 .

[47]  Zhenchang Xing,et al.  Neural-Machine-Translation-Based Commit Message Generation: How Far Are We? , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[48]  Rico Sennrich,et al.  Edinburgh Neural Machine Translation Systems for WMT 16 , 2016, WMT.

[49]  Le Song,et al.  Hoppity: Learning Graph Transformations to Detect and Fix Bugs in Programs , 2020, ICLR.

[50]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[51]  Ming Zhou,et al.  CodeBLEU: a Method for Automatic Evaluation of Code Synthesis , 2020, ArXiv.

[52]  Westley Weimer,et al.  Automatically documenting program changes , 2010, ASE.

[53]  Zenglin Xu,et al.  Contextualized Code Representation Learning for Commit Message Generation , 2020, Neurocomputing.