Fixing Gaussian Mixture VAEs for Interpretable Text Generation

Variational auto-encoder (VAE) with Gaussian priors is effective in text generation. To improve the controllability and interpretability, we propose to use Gaussian mixture distribution as the prior for VAE (GMVAE), since it includes an extra discrete latent variable in addition to the continuous one. Unfortunately, training GMVAE using standard variational approximation often leads to the mode-collapse problem. We theoretically analyze the root cause --- maximizing the evidence lower bound of GMVAE implicitly aggregates the means of multiple Gaussian priors. We propose Dispersed-GMVAE (DGMVAE), an improved model for text generation. It introduces two extra terms to alleviate mode-collapse and to induce a better structured latent space. Experimental results show that DGMVAE outperforms strong baselines in several language modeling and text generation benchmarks.

[1]  Joelle Pineau,et al.  A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues , 2016, AAAI.

[2]  Xiaoyu Shen,et al.  DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset , 2017, IJCNLP.

[3]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[4]  Jakob Grue Simonsen,et al.  A Hierarchical Recurrent Encoder-Decoder for Generative Context-Aware Query Suggestion , 2015, CIKM.

[5]  Huachun Tan,et al.  Variational Deep Embedding: An Unsupervised and Generative Approach to Clustering , 2016, IJCAI.

[6]  Guoyin Wang,et al.  Topic-Guided Variational Auto-Encoder for Text Generation , 2019, NAACL.

[7]  Erhardt Barth,et al.  A Hybrid Convolutional Variational Autoencoder for Text Generation , 2017, EMNLP.

[8]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[9]  Joelle Pineau,et al.  Bootstrapping Dialog Systems with Word Embeddings , 2014 .

[10]  Min Zhang,et al.  Variational Neural Machine Translation , 2016, EMNLP.

[11]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[12]  Mirella Lapata,et al.  Vector-based Models of Semantic Composition , 2008, ACL.

[13]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[14]  Ankush Gupta,et al.  A Deep Generative Framework for Paraphrase Generation , 2017, AAAI.

[15]  Tsung-Hsien Wen,et al.  Latent Intention Dialogue Models , 2017, ICML.

[16]  Piji Li,et al.  Deep Recurrent Generative Decoder for Abstractive Text Summarization , 2017, EMNLP.

[17]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[18]  Graham Neubig,et al.  Multi-space Variational Encoder-Decoders for Semi-supervised Labeled Sequence Transduction , 2017, ACL.

[19]  Maxine Eskénazi,et al.  Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders , 2017, ACL.

[20]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[21]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[22]  Maxine Eskénazi,et al.  Unsupervised Discrete Sentence Representation Learning for Interpretable Neural Dialog Generation , 2018, ACL.

[23]  Richard Socher,et al.  Regularizing and Optimizing LSTM Language Models , 2017, ICLR.

[24]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[25]  Phil Blunsom,et al.  Neural Variational Inference for Text Processing , 2015, ICML.

[26]  Eric P. Xing,et al.  Toward Controlled Generation of Text , 2017, ICML.

[27]  Vasile Rus,et al.  A Comparison of Greedy and Optimal Assessment of Natural Language Student Input Using Word-to-Word Similarity Metrics , 2012, BEA@NAACL-HLT.

[28]  Jiacheng Xu,et al.  Spherical Latent Spaces for Stable Variational Autoencoders , 2018, EMNLP.

[29]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[30]  Christopher D. Manning,et al.  Key-Value Retrieval Networks for Task-Oriented Dialogue , 2017, SIGDIAL Conference.

[31]  Murray Shanahan,et al.  Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders , 2016, ArXiv.

[32]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[33]  Michael I. Jordan,et al.  A generalized mean field algorithm for variational inference in exponential families , 2002, UAI.

[34]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[35]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[36]  Joelle Pineau,et al.  Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.

[37]  Stefano Ermon,et al.  InfoVAE: Balancing Learning and Inference in Variational Autoencoders , 2019, AAAI.

[38]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[39]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[40]  Alexander M. Rush,et al.  Adversarially Regularized Autoencoders , 2017, ICML.

[41]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.