Antecedent Predictions Are Dominant for Tree-Based Code Generation

Code generation focuses on the automatic conversion of natural language (NL) utterances into code snippets. The sequence-to-tree (Seq2Tree) methods, e.g., TRANX, are proposed for code generation, with the guarantee of the compilability of the generated code, which generate the subsequent Abstract Syntax Tree (AST) node relying on antecedent predictions of AST nodes. Existing Seq2Tree methods tend to treat both antecedent predictions and subsequent predictions equally. However, under the AST constraints, it is difficult for Seq2Tree models to produce the correct subsequent prediction based on incorrect antecedent predictions. Thus, antecedent predictions ought to receive more attention than subsequent predictions. To this end, in this paper, we propose an effective method, named APTRANX (Antecedent Prioritized TRANX), on the basis of TRANX. APTRANX contains an Antecedent Prioritized (AP) Loss, which helps the model attach importance to antecedent predictions by exploiting the position information of the generated AST nodes. With better antecedent predictions and accompanying subsequent predictions, APTRANX significantly improves the performance. We conduct extensive experiments on several benchmark datasets, and the experimental results demonstrate the superiority and generality of our proposed method compared with the state-of-the-art methods.

[1]  Fandong Meng,et al.  Exploring Dynamic Selection of Branch Expansion Orders for Code Generation , 2021, ACL.

[2]  Jianwei Cui,et al.  Improving Tree-Structured Decoder Training for Code Generation via Mutual Learning , 2021, AAAI.

[3]  Graham Neubig,et al.  Incorporating External Knowledge through Pre-training for Natural Language to Code Generation , 2020, ACL.

[4]  Lili Mou,et al.  TreeGen: A Tree-Based Transformer Architecture for Code Generation , 2019, AAAI.

[5]  Xin Xia,et al.  Code Generation as a Dual Task of Code Summarization , 2019, NeurIPS.

[6]  Graham Neubig,et al.  Reranking for Neural Semantic Parsing , 2019, ACL.

[7]  Oleksandr Polozov,et al.  Program Synthesis and Semantic Parsing with Learned Code Idioms , 2019, NeurIPS.

[8]  Lili Mou,et al.  A Grammar-Based Structural CNN Decoder for Code Generation , 2018, AAAI.

[9]  Graham Neubig,et al.  TRANX: A Transition-based Neural Abstract Syntax Parser for Semantic Parsing and Code Generation , 2018, EMNLP.

[10]  Graham Neubig,et al.  Retrieval-Based Neural Code Generation , 2018, EMNLP.

[11]  Graham Neubig,et al.  Learning to Mine Aligned Code and Natural Language Pairs from Stack Overflow , 2018, 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR).

[12]  Mirella Lapata,et al.  Coarse-to-Fine Decoding for Neural Semantic Parsing , 2018, ACL.

[13]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[14]  Dan Klein,et al.  Abstract Syntax Networks for Code Generation and Semantic Parsing , 2017, ACL.

[15]  Tommi S. Jaakkola,et al.  Tree-structured decoding with doubly-recurrent neural networks , 2016, ICLR.

[16]  Wang Ling,et al.  Latent Predictor Networks for Code Generation , 2016, ACL.

[17]  Mirella Lapata,et al.  Language to Logical Form with Neural Attention , 2016, ACL.

[18]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19]  Raymond J. Mooney,et al.  Learning to Parse Database Queries Using Inductive Logic Programming , 1996, AAAI/IAAI, Vol. 2.

[20]  George R. Doddington,et al.  The ATIS Spoken Language Systems Pilot Corpus , 1990, HLT.

[21]  Yubin Ge,et al.  An AST Structure Enhanced Decoder for Code Generation , 2022, IEEE/ACM Transactions on Audio, Speech, and Language Processing.