A Sketch-Based Neural Model for Generating Commit Messages from Diffs

Commit messages have an important impact in software development, especially when working in large teams. Multiple developers who have a different style of writing may often be involved in the same project. For this reason, it may be difficult to maintain a strict pattern of writing informative commit messages, with the most frequent issue being that these messages are not descriptive enough. In this paper we apply neural machine translation (NMT) techniques to convert code diffs into commit messages and we present an improved sketchbased encoder for this task. We split the approach into three parts. Firstly, we focus on finding a more suitable NMT baseline for this problem. Secondly, we show that the performance of the NMT models can be improved by training on examples containing a specific file type. Lastly, we introduce a novel sketch-based neural model inspired by recent approaches used for code generation and we show that the sketch-based encoder significantly outperforms existing state of the art solutions. The results highlight that this improvement is relevant especially for Java source code files, by examining two different datasets introduced in recent years for this task.

[1]  Mirella Lapata,et al.  Coarse-to-Fine Decoding for Neural Semantic Parsing , 2018, ACL.

[2]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[3]  Dirk P. Kroese,et al.  Kernel density estimation via diffusion , 2010, 1011.2602.

[4]  Quoc V. Le,et al.  Massive Exploration of Neural Machine Translation Architectures , 2017, EMNLP.

[5]  Mario Linares Vásquez,et al.  ChangeScribe: A Tool for Automatically Generating Commit Messages , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[6]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[7]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[8]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[9]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[10]  Collin McMillan,et al.  Automatically generating commit messages from diffs using neural machine translation , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[11]  Westley Weimer,et al.  Automatically documenting program changes , 2010, ASE.

[12]  Hang Li,et al.  “ Tony ” DNN Embedding for “ Tony ” Selective Read for “ Tony ” ( a ) Attention-based Encoder-Decoder ( RNNSearch ) ( c ) State Update s 4 SourceVocabulary Softmax Prob , 2016 .

[13]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[14]  Zhenchang Xing,et al.  Neural-Machine-Translation-Based Commit Message Generation: How Far Are We? , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[15]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Collin McMillan,et al.  Towards Automatic Generation of Short Summaries of Commits , 2017, 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC).

[17]  Traian Rebedea,et al.  Natural Language Interface for Databases Using a Dual-Encoder Model , 2018, COLING.

[18]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.