A Transformer-Based Variational Autoencoder for Sentence Generation

The variational autoencoder(VAE) has been proved to be a most efficient generative model, but its applications in natural language tasks have not been fully developed. A novel variational autoencoder for natural texts generation is presented in this paper. Compared to the previously introduced variational autoencoder for natural text where both the encoder and decoder are RNN-based, we propose a new transformer-based architecture and augment the decoder with an LSTM language model layer to fully exploit information of latent variables. We also propose some methods to deal with problems during training time, such as KL divergency collapsing and model degradation. In the experiment, we use random sampling and linear interpolation to test our model. Results show that the generated sentences by our approach are more meaningful and the semantics are more coherent in the latent space.

[1]  Zhiting Hu,et al.  Improved Variational Autoencoders for Text Modeling using Dilated Convolutions , 2017, ICML.

[2]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[3]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[4]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[5]  Lantao Yu,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[6]  Felix Hill,et al.  Learning Distributed Representations of Sentences from Unlabelled Data , 2016, NAACL.

[7]  Rob Fergus,et al.  Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[8]  Eneko Agirre,et al.  Unsupervised Neural Machine Translation , 2017, ICLR.

[9]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[10]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[11]  Ole Winther,et al.  Sequential Neural Models with Stochastic Layers , 2016, NIPS.

[12]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[13]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[14]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[15]  Phil Blunsom,et al.  Neural Variational Inference for Text Processing , 2015, ICML.

[16]  Erhardt Barth,et al.  A Hybrid Convolutional Variational Autoencoder for Text Generation , 2017, EMNLP.

[17]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[18]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[19]  Yonghui Wu,et al.  Exploring the Limits of Language Modeling , 2016, ArXiv.

[20]  Ming Li,et al.  Generating Thematic Chinese Poetry using Conditional Variational Autoencoders with Hybrid Decoders , 2017, IJCAI.

[21]  Quoc V. Le,et al.  Semi-supervised Sequence Learning , 2015, NIPS.

[22]  Colin Raffel,et al.  A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music , 2018, ICML.

[23]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[24]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.