CodeeGAN: Code Generation via Adversarial Training

The automatic generation of code is an important research problem in the field of Machine Learning. Generative Adversarial Network (GAN) exhibits a powerful ability in image generation. However, generating code via GAN is so far an unexplored research area, the reason of which is the discrete output of language model hinders the application of gradient-based GANs. In this paper, we propose a model called CodeeGAN to generate code via adversarial training. First, we adopt Policy Gradient method in Reinforcement Learning (RL) to solve the problem of discrete data. Data generated by the generative model is discrete data which makes the generative model cannot be adjusted by gradient descent. Second, we use Monte Carlo Tree Search (MCTS) to create our rollout network for evaluating the loss of generated tokens. Based on the two mechanisms above, we create CodeeGAN model to generate code via adversarial training. We evaluate the model with datasets from four different platforms. Our model shows a better performance than other existing works and proves that code generation via adversarial training is an advanced efficient method.

[1]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  Zhuo Lu,et al.  Effectiveness of Machine Learning Based Intrusion Detection Systems , 2019, SpaCCS.

[4]  Bernt Schiele,et al.  Generative Adversarial Text to Image Synthesis , 2016, ICML.

[5]  Tony Beltramelli,et al.  pix2code: Generating Code from a Graphical User Interface Screenshot , 2017, EICS.

[6]  Alan Ritter,et al.  Adversarial Learning for Neural Dialogue Generation , 2017, EMNLP.

[7]  Philip Bachman,et al.  Data Generation as Sequential Decision Making , 2015, NIPS.

[8]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[9]  Matt J. Kusner,et al.  GANS for Sequences of Discrete Elements with the Gumbel-softmax Distribution , 2016, ArXiv.

[10]  Jun Zhao,et al.  Recurrent Convolutional Neural Networks for Text Classification , 2015, AAAI.

[11]  Lukás Burget,et al.  Sequence-discriminative training of deep neural networks , 2013, INTERSPEECH.

[12]  Dimitris N. Metaxas,et al.  StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Alexander M. Rush,et al.  Image-to-Markup Generation with Coarse-to-Fine Attention , 2016, ICML.

[14]  Yunpeng Zhang,et al.  A New Intrusion Detection System Based on Gated Recurrent Unit (GRU) and Genetic Algorithm , 2019, SpaCCS.