Embedding API dependency graph for neural code generation

The problem of code generation from textual program descriptions has long been viewed as a grand challenge in software engineering. In recent years, many deep learning based approaches have been proposed, which can generate a sequence of code from a sequence of textual program description. However, the existing approaches ignore the global relationships among API methods, which are important for understanding the usage of APIs. In this paper, we propose to model the dependencies among API methods as an API dependency graph (ADG) and incorporate the graph embedding into a sequence-to-sequence (Seq2Seq) model. In addition to the existing encoder-decoder structure, a new module named “embedder” is introduced. In this way, the decoder can utilize both global structural dependencies and textual program description to predict the target code. We conduct extensive code generation experiments on three public datasets and in two programming languages (Python and Java). Our proposed approach, called ADGChen Lyu ( ) E-mail: lvchen@sdnu.edu.cn Ruyun Wang E-mail: ruyunw@outlook.com Hongyu Zhang E-mail: hongyu.zhang@newcastle.edu.au Hanwen Zhang E-mail: zhanghanwen0726@gmail.com Songlin Hu E-mail: husonglin@iie.ac.cn 1.School of Information Science and Engineering, Shandong Normal University, Jinan, China. 2.The University of Newcastle, Callaghan, NSW, Australia. 3.Big Data Center of Shandong Province, Jinan, China. 4.Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China. 5.School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China. ar X iv :2 10 3. 15 36 1v 1 [ cs .S E ] 2 9 M ar 2 02 1

[1]  Qun Liu,et al.  Incorporating Word Reordering Knowledge into Attention-based Neural Machine Translation , 2017, ACL.

[2]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[3]  Hailong Sun,et al.  Retrieval-based Neural Source Code Summarization , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[4]  Shuai Lu,et al.  Summarizing Source Code with Transferred API Knowledge , 2018, IJCAI.

[5]  Lili Mou,et al.  A Grammar-Based Structural CNN Decoder for Code Generation , 2018, AAAI.

[6]  Zhi Jin,et al.  On End-to-End Program Generation from User Intention by Deep Neural Networks , 2015, ArXiv.

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Quoc V. Le,et al.  Multi-task Sequence to Sequence Learning , 2015, ICLR.

[9]  Lili Mou,et al.  TreeGen: A Tree-Based Transformer Architecture for Code Generation , 2019, AAAI.

[10]  Alfred V. Aho,et al.  Code generation using tree matching and dynamic programming , 1989, ACM Trans. Program. Lang. Syst..

[11]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[12]  David Lo,et al.  Deep Code Comment Generation , 2018, 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC).

[13]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[14]  Oleksandr Polozov,et al.  Generative Code Modeling with Graphs , 2018, ICLR.

[15]  Kevin Duh,et al.  Automatic Evaluation of Translation Quality for Distant Language Pairs , 2010, EMNLP.

[16]  Minh Le Nguyen,et al.  Convolutional Neural Networks over Control Flow Graphs for Software Defect Prediction , 2017, 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI).

[17]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[18]  Ke Wang,et al.  Dynamic Neural Program Embedding for Program Repair , 2017, ICLR.

[19]  Pushmeet Kohli,et al.  Graph Matching Networks for Learning the Similarity of Graph Structured Objects , 2019, ICML.

[20]  Xiaodong Gu,et al.  CodeKernel: A Graph Kernel Based Approach to the Selection of API Usage Examples , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[21]  Jean Oh,et al.  Attention-based Multimodal Neural Machine Translation , 2016, WMT.

[22]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[23]  Graham Neubig,et al.  Retrieval-Based Neural Code Generation , 2018, EMNLP.

[24]  Chris Quirk,et al.  Novel positional encodings to enable tree-based transformers , 2019, NeurIPS.

[25]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[26]  Mirella Lapata,et al.  Language to Logical Form with Neural Attention , 2016, ACL.

[27]  Tao Wang,et al.  Convolutional Neural Networks over Tree Structures for Programming Language Processing , 2014, AAAI.

[28]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[29]  Christoph Treude,et al.  Automatic Generation of Pull Request Descriptions , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[30]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[31]  Kazi Sakib,et al.  A Similarity-Based Method Retrieval Technique to Improve Effectiveness in Code Search , 2017, Programming.

[32]  Marc Brockschmidt,et al.  Learning to Represent Programs with Graphs , 2017, ICLR.

[33]  Wang Ling,et al.  Latent Predictor Networks for Code Generation , 2016, ACL.

[34]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[35]  Hailong Sun,et al.  A Novel Neural Source Code Representation Based on Abstract Syntax Tree , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[36]  J. Sim,et al.  The kappa statistic in reliability studies: use, interpretation, and sample size requirements. , 2005, Physical therapy.

[37]  Xin Xia,et al.  Code Generation as a Dual Task of Code Summarization , 2019, NeurIPS.

[38]  David Lo,et al.  Deep code comment generation with hybrid lexical and syntactical information , 2019, Empirical Software Engineering.

[39]  Alex Graves,et al.  Grid Long Short-Term Memory , 2015, ICLR.

[40]  Dan Klein,et al.  Abstract Syntax Networks for Code Generation and Semantic Parsing , 2017, ACL.

[41]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[42]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[43]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.

[44]  Raymond J. Mooney,et al.  Language to Code: Learning Semantic Parsers for If-This-Then-That Recipes , 2015, ACL.

[45]  C. Lawrence Zitnick,et al.  CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[47]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[48]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[49]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[50]  Swarat Chaudhuri,et al.  Neural Sketch Learning for Conditional Program Generation , 2017, ICLR.

[51]  Philip S. Yu,et al.  Multi-modal Attention Network Learning for Semantic Source Code Retrieval , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[52]  Jianfeng Gao,et al.  An Information-Theoretic Approach to Automatic Evaluation of Summaries , 2006, NAACL.

[53]  Graham Neubig,et al.  A Syntactic Neural Model for General-Purpose Code Generation , 2017, ACL.

[54]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[55]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[56]  Zhi Jin,et al.  Detecting Code Clones with Graph Neural Network and Flow-Augmented Abstract Syntax Tree , 2020, 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[57]  Jiashi Feng,et al.  Effective Training Strategies for Deep Graph Neural Networks , 2020, ArXiv.

[58]  José A. R. Fonollosa,et al.  Character-based Neural Machine Translation , 2016, ACL.