Neural Edit Completion

We address the problem of predicting edit completions based on a learned model that was trained on past edits. Given a code snippet that is partially edited, our goal is to predict a completion of the edit for the rest of the snippet. We refer to this task as the EditCompletion task and present a novel approach for tackling it. The main idea is to directly represent structural edits. This allows us to model the likelihood of the edit itself, rather than learning the likelihood of the edited code. We represent an edit operation as a path in the program's Abstract Syntax Tree (AST), originating from the source of the edit to the target of the edit. Using this representation, we present a powerful and lightweight neural model for the EditCompletion task. We conduct a thorough evaluation, comparing our approach to a variety of representation and modeling approaches that are driven by multiple strong models such as LSTMs, Transformers, and neural CRFs. Our experiments show that our model achieves 28% relative gain over state-of-the-art sequential models and 2$\times$ higher accuracy than syntactic models that learn to generate the edited code instead of modeling the edits directly. We make our code, dataset, and trained models publicly available.

[1]  J. W. Hunt,et al.  An Algorithm for Differential File Comparison , 2008 .

[2]  Uri Alon,et al.  code2vec: learning distributed representations of code , 2018, Proc. ACM Program. Lang..

[3]  Oleksandr Polozov,et al.  Generative Code Modeling with Graphs , 2018, ICLR.

[4]  Graham Neubig,et al.  Learning to Represent Edits , 2018, ICLR.

[5]  Denys Poshyvanyk,et al.  SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair , 2018, IEEE Transactions on Software Engineering.

[6]  Quoc V. Le,et al.  Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Andrew Rice,et al.  Learning to Fix Build Errors with Graph2Diff Neural Networks , 2019, ICSE.

[8]  Marc Brockschmidt,et al.  Structured Neural Summarization , 2018, ICLR.

[9]  Marc Brockschmidt,et al.  Learning to Represent Programs with Graphs , 2017, ICLR.

[10]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[11]  Aditya Kanade,et al.  Neural Program Repair by Jointly Learning to Localize and Repair , 2019, ICLR.

[12]  Matias Martinez,et al.  Fine-grained and accurate source code differencing , 2014, ASE.

[13]  Baishakhi Ray,et al.  Tree2Tree Neural Translation Model for Learning Source Code Changes , 2018, ArXiv.

[14]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Alvin Cheung,et al.  Summarizing Source Code using a Neural Attention Model , 2016, ACL.

[17]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[18]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[19]  Yoav Goldberg,et al.  Towards String-To-Tree Neural Machine Translation , 2017, ACL.

[20]  Omer Levy,et al.  Structural Language Models of Code , 2020, ICML.

[21]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[22]  Ali Mesbah,et al.  DeepDelta: learning to repair compilation errors , 2019, ESEC/SIGSOFT FSE.

[23]  Jan Vitek,et al.  DéjàVu: a map of code duplicates on GitHub , 2017, Proc. ACM Program. Lang..

[24]  Charles A. Sutton,et al.  Suggesting accurate method and class names , 2015, ESEC/SIGSOFT FSE.

[25]  Uri Alon,et al.  A general path-based representation for predicting program properties , 2018, PLDI.

[26]  Hang Li,et al.  “ Tony ” DNN Embedding for “ Tony ” Selective Read for “ Tony ” ( a ) Attention-based Encoder-Decoder ( RNNSearch ) ( c ) State Update s 4 SourceVocabulary Softmax Prob , 2016 .

[27]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[28]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[29]  Miltiadis Allamanis,et al.  The adverse effects of code duplication in machine learning models of code , 2018, Onward!.

[30]  Gabriele Bavota,et al.  An Empirical Study on Learning Bug-Fixing Patches in the Wild via Neural Machine Translation , 2018, ACM Trans. Softw. Eng. Methodol..

[31]  Le Song,et al.  Hoppity: Learning Graph Transformations to Detect and Fix Bugs in Programs , 2020, ICLR.

[32]  Aliaksei Severyn,et al.  Encode, Tag, Realize: High-Precision Text Editing , 2019, EMNLP.

[33]  Charles A. Sutton,et al.  A Convolutional Attention Network for Extreme Summarization of Source Code , 2016, ICML.

[34]  Omer Levy,et al.  code2seq: Generating Sequences from Structured Representations of Code , 2018, ICLR.

[35]  R. Rubinstein The Cross-Entropy Method for Combinatorial and Continuous Optimization , 1999 .

[36]  T J Gibson,et al.  PairWise and SearchWise: finding the optimal alignment in a simultaneous comparison of a protein profile against all DNA translation frames. , 1996, Nucleic acids research.

[37]  Rishabh Singh,et al.  Global Relational Models of Source Code , 2020, ICLR.

[38]  David Lo,et al.  CC2Vec: Distributed Representations of Code Changes , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[39]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[40]  Jennifer Widom,et al.  Change detection in hierarchically structured information , 1996, SIGMOD '96.