SentenceMIM: A Latent Variable Language Model

We introduce sentenceMIM, a probabilistic auto-encoder for language modelling, trained with Mutual Information Machine (MIM) learning. Previous attempts to learn variational auto-encoders for language data have had mixed success, with empirical performance well below state-of-the-art auto-regressive models, a key barrier being the occurrence of posterior collapse with VAEs. The recently proposed MIM framework encourages high mutual information between observations and latent variables, and is more robust against posterior collapse. This paper formulates a MIM model for text data, along with a corresponding learning algorithm. We demonstrate excellent perplexity (PPL) results on several datasets, and show that the framework learns a rich latent space, allowing for interpolation between sentences of different lengths with a fixed-dimensional latent representation. We also demonstrate the versatility of sentenceMIM by utilizing a trained model for question-answering, a transfer learning task, without fine-tuning. To the best of our knowledge, this is the first latent variable model (LVM) for text modelling that achieves competitive performance with non-LVM models.

[1]  Percy Liang,et al.  Generating Sentences by Editing Prototypes , 2017, TACL.

[2]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[3]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[4]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[5]  Richard Socher,et al.  Regularizing and Optimizing LSTM Language Models , 2017, ICLR.

[6]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[7]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[8]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[9]  Jianfeng Gao,et al.  Implicit Deep Latent Variable Models for Text Generation , 2019, EMNLP/IJCNLP.

[10]  Stefano Ermon,et al.  A Lagrangian Perspective on Latent Variable Generative Models , 2018, UAI.

[11]  Matthew Collinson,et al.  A Stable Variational Autoencoder for Text Modelling , 2019, INLG.

[12]  Oriol Vinyals,et al.  Neural Discrete Representation Learning , 2017, NIPS.

[13]  Canasai Kruengkrai Better Exploiting Latent Variables in Text Modeling , 2019, ACL.

[14]  Siu Cheung Hui,et al.  Enabling Efficient Question Answer Retrieval via Hyperbolic Neural Networks , 2017, ArXiv.

[15]  Steve Renals,et al.  Dynamic Evaluation of Transformer Language Models , 2019, ArXiv.

[16]  Max Welling,et al.  VAE with a VampPrior , 2017, AISTATS.

[17]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[18]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[19]  Alexander A. Alemi,et al.  An Information-Theoretic Analysis of Deep Latent-Variable Models , 2017, ArXiv.

[20]  Graham Neubig,et al.  Lagging Inference Networks and Posterior Collapse in Variational Autoencoders , 2019, ICLR.

[21]  Zhiting Hu,et al.  Improved Variational Autoencoders for Text Modeling using Dilated Convolutions , 2017, ICML.

[22]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[23]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[24]  Koray Kavukcuoglu,et al.  Pixel Recurrent Neural Networks , 2016, ICML.

[25]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[26]  Mohammad Shoeybi,et al.  Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[29]  Ali Razavi,et al.  Preventing Posterior Collapse with delta-VAEs , 2019, ICLR.

[30]  Bowen Zhou,et al.  Attentive Pooling Networks , 2016, ArXiv.

[31]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[32]  Siu Cheung Hui,et al.  Learning to Rank Question Answer Pairs with Holographic Dual LSTM Architecture , 2017, SIGIR.

[33]  Alexander M. Rush,et al.  Semi-Amortized Variational Autoencoders , 2018, ICML.

[34]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.