Latent Space Energy-Based Model of Symbol-Vector Coupling for Text Generation and Classification

We propose a latent space energy-based prior model for text generation and classification. The model stands on a generator network that generates the text sequence based on a continuous latent vector. The energy term of the prior model couples a continuous latent vector and a symbolic one-hot vector, so that discrete category can be inferred from the observed example based on the continuous latent vector. Such a latent space coupling naturally enables incorporation of information bottleneck regularization to encourage the continuous latent vector to extract information from the observed example that is informative of the underlying category. In our learning method, the symbol-vector coupling, the generator network and the inference network are learned jointly. Our model can be learned in an unsupervised setting where no category labels are provided. It can also be learned in semi-supervised setting where category labels are provided for a subset of training examples. Our experiments demonstrate that the proposed model learns well-structured and meaningful latent space, which (1) guides the generator to generate text with high quality, diversity, and interpretability, and (2) effectively classifies text.

[1]  Lei Li,et al.  Dispersed Exponential Family Mixture VAEs for Interpretable Text Generation , 2020, ICML.

[2]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[3]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[4]  Richard Socher,et al.  Regularizing and Optimizing LSTM Language Models , 2017, ICLR.

[5]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[6]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[7]  Christopher D. Manning,et al.  Key-Value Retrieval Networks for Task-Oriented Dialogue , 2017, SIGDIAL Conference.

[8]  Yiming Yang,et al.  A Surprisingly Effective Fix for Deep Latent Variable Modeling of Text , 2019, EMNLP.

[9]  Tian Han,et al.  Joint Training of Variational Auto-Encoder and Latent Energy-Based Model , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Yang Lu,et al.  A Theory of Generative ConvNet , 2016, ICML.

[11]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[12]  Xiaodong Liu,et al.  Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing , 2019, NAACL.

[13]  Maxine Eskénazi,et al.  Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders , 2017, ACL.

[14]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[15]  Piji Li,et al.  Deep Recurrent Generative Decoder for Abstractive Text Summarization , 2017, EMNLP.

[16]  Joelle Pineau,et al.  A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues , 2016, AAAI.

[17]  Noah A. Smith,et al.  Variational Pretraining for Semi-supervised Text Classification , 2019, ACL.

[18]  Alexander M. Rush,et al.  Adversarially Regularized Autoencoders , 2017, ICML.

[19]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[20]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[21]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[22]  Jan Kautz,et al.  NCP-VAE: Variational Autoencoders with Noise Contrastive Priors , 2020, ArXiv.

[23]  Tian Han,et al.  Learning Latent Space Energy-Based Prior Model , 2020, NeurIPS.

[24]  Percy Liang,et al.  Delete, Retrieve, Generate: a Simple Approach to Sentiment and Style Transfer , 2018, NAACL.

[25]  Alexander A. Alemi,et al.  Deep Variational Information Bottleneck , 2017, ICLR.

[26]  Mirella Lapata,et al.  Vector-based Models of Semantic Composition , 2008, ACL.

[27]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[28]  Guoyin Wang,et al.  Topic-Guided Variational Auto-Encoder for Text Generation , 2019, NAACL.

[29]  Phil Blunsom,et al.  Neural Variational Inference for Text Processing , 2015, ICML.

[30]  Mohammad Norouzi,et al.  Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One , 2019, ICLR.

[31]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[32]  Xiaoyu Shen,et al.  DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset , 2017, IJCNLP.

[33]  Ying Nian Wu,et al.  Trajectory Prediction with Latent Belief Energy-Based Model , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Ankush Gupta,et al.  A Deep Generative Framework for Paraphrase Generation , 2017, AAAI.

[35]  Graham Neubig,et al.  Lagging Inference Networks and Posterior Collapse in Variational Autoencoders , 2019, ICLR.

[36]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[37]  Erik Nijkamp,et al.  Learning Non-Convergent Non-Persistent Short-Run MCMC Toward Energy-Based Model , 2019, NeurIPS.

[38]  Min Zhang,et al.  Variational Neural Machine Translation , 2016, EMNLP.

[39]  Danqi Chen,et al.  of the Association for Computational Linguistics: , 2001 .

[40]  Vasile Rus,et al.  A Comparison of Greedy and Optimal Assessment of Natural Language Student Input Using Word-to-Word Similarity Metrics , 2012, BEA@NAACL-HLT.

[41]  Noah A. Smith,et al.  Neural Models for Documents with Metadata , 2017, ACL.

[42]  Milica Gasic,et al.  POMDP-Based Statistical Spoken Dialog Systems: A Review , 2013, Proceedings of the IEEE.

[43]  Tian Han,et al.  Learning Latent Space Energy-Based Prior Model for Molecule Generation , 2020, ArXiv.

[44]  Christopher Joseph Pal,et al.  Towards Text Generation with Adversarially Learned Neural Outlines , 2018, NeurIPS.

[45]  Joelle Pineau,et al.  Bootstrapping Dialog Systems with Word Embeddings , 2014 .

[46]  Karl Stratos,et al.  Discrete Latent Variable Representations for Low-Resource Text Classification , 2020, ACL.

[47]  Maxine Eskénazi,et al.  Unsupervised Discrete Sentence Representation Learning for Interpretable Neural Dialog Generation , 2018, ACL.

[48]  Joelle Pineau,et al.  Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.

[49]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.