Stochastic Wasserstein Autoencoder for Probabilistic Sentence Generation

The variational autoencoder (VAE) imposes a probabilistic distribution (typically Gaussian) on the latent space and penalizes the Kullback-Leibler (KL) divergence between the posterior and prior. In NLP, VAEs are extremely difficult to train due to the problem of KL collapsing to zero. One has to implement various heuristics such as KL weight annealing and word dropout in a carefully engineered manner to successfully train a VAE for text. In this paper, we propose to use the Wasserstein autoencoder (WAE) for probabilistic sentence generation, where the encoder could be either stochastic or deterministic. We show theoretically and empirically that, in the original WAE, the stochastically encoded Gaussian distribution tends to become a Dirac-delta function, and we propose a variant of WAE that encourages the stochasticity of the encoder. Experimental results show that the latent space learned by WAE exhibits properties of continuity and smoothness as in VAEs, while simultaneously achieving much higher BLEU scores for sentence reconstruction.

[1]  Jiacheng Xu,et al.  Spherical Latent Spaces for Stable Variational Autoencoders , 2018, EMNLP.

[2]  Xu Sun,et al.  An Auto-Encoder Matching Model for Learning Utterance-Level Semantic Dependency in Dialogue Generation , 2018, EMNLP.

[3]  Guillaume Desjardins,et al.  Understanding disentangling in β-VAE , 2018, ArXiv.

[4]  Pascal Poupart,et al.  Variational Attention for Sequence-to-Sequence Models , 2017, COLING.

[5]  Xiaoyu Shen,et al.  DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset , 2017, IJCNLP.

[6]  Alexander M. Rush,et al.  Adversarially Regularized Autoencoders , 2017, ICML.

[7]  Graham Neubig,et al.  Multi-space Variational Encoder-Decoders for Semi-supervised Labeled Sequence Transduction , 2017, Annual Meeting of the Association for Computational Linguistics.

[8]  Eric P. Xing,et al.  Toward Controlled Generation of Text , 2017, ICML.

[9]  Zhiting Hu,et al.  Improved Variational Autoencoders for Text Modeling using Dilated Convolutions , 2017, ICML.

[10]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[11]  Min Zhang,et al.  Variational Neural Machine Translation , 2016, EMNLP.

[12]  Joelle Pineau,et al.  A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues , 2016, AAAI.

[13]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[14]  Jianfeng Gao,et al.  A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.

[15]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[16]  Joshua B. Tenenbaum,et al.  Deep Convolutional Inverse Graphics Network , 2015, NIPS.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[19]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[20]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[21]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[22]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.