Code2Text: Dual Attention Syntax Annotation Networks for Structure-Aware Code Translation

Translating source code into natural language text helps people understand the computer program better and faster. Previous code translation methods mainly exploit human specified syntax rules. Since handcrafted syntax rules are expensive to obtain and not always available, a PL-independent automatic code translation method is much more desired. However, existing sequence translation methods generally regard source text as a plain sequence, which is not competent to capture the rich hierarchical characteristics inherently reside in the code. In this work, we exploit the abstract syntax tree (AST) that summarizes the hierarchical information of a code snippet to build a structure-aware code translation method. We propose a syntax annotation network called Code2Text to incorporate both source code and its AST into the translation. Our Code2Text features the dual encoders for the sequential input (code) and the structural input (AST) respectively. We also propose a novel dual-attention mechanism to guide the decoding process by accurately aligning the output words with both the tokens in the source code and the nodes in the AST. Experiments on a public collection of Python code demonstrate that Code2Text achieves better performance compared to several state-of-the-art methods, and the generation of Code2Text is accurate and human-readable.

[1]  Andrian Marcus,et al.  On the Use of Automated Text Summarization Techniques for Summarizing Source Code , 2010, 2010 17th Working Conference on Reverse Engineering.

[2]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[3]  Karen Spärck Jones Automatic summarising: The state of the art , 2007, Inf. Process. Manag..

[4]  Tomoki Toda,et al.  Learning to Generate Pseudo-Code from Source Code Using Statistical Machine Translation (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[5]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[6]  Ming Li,et al.  Code Attention: Translating Code to Comments by Exploiting Domain Features , 2017, ArXiv.

[7]  Lukás Burget,et al.  Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Lori L. Pollock,et al.  Automatically detecting and describing high level actions within methods , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[9]  Jinqiu Yang,et al.  AutoComment: Mining question and answer sites for automatic comment generation , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[10]  Martin T. Vechev,et al.  Phrase-Based Statistical Translation of Programming Languages , 2014, Onward!.

[11]  Lori L. Pollock,et al.  Automatic generation of natural language summaries for Java classes , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[12]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[13]  Lei Liu,et al.  TreeGAN: Syntax-Aware Sequence Generation with Generative Adversarial Networks , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[14]  Emily Hill,et al.  Towards automatically generating summary comments for Java methods , 2010, ASE.

[15]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[16]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[17]  Charles A. Sutton,et al.  A Convolutional Attention Network for Extreme Summarization of Source Code , 2016, ICML.

[18]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[19]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[20]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.