Generative Text Modeling through Short Run Inference

Latent variable models for text, when trained successfully, accurately model the data distribution and capture global semantic and syntactic features of sentences. The prominent approach to train such models is variational autoencoders (VAE). It is nevertheless challenging to train and often results in a trivial local optimum where the latent variable is ignored and its posterior collapses into the prior, an issue known as posterior collapse. Various techniques have been proposed to mitigate this issue. Most of them focus on improving the inference model to yield latent codes of higher quality. The present work proposes a short run dynamics for inference. It is initialized from the prior distribution of the latent variable and then runs a small number (e.g., 20) of Langevin dynamics steps guided by its posterior distribution. The major advantage of our method is that it does not require a separate inference model or assume simple geometry of the posterior distribution, thus rendering an automatic, natural and flexible inference engine. We show that the models trained with short run dynamics more accurately model the data, compared to strong language model and VAE baselines, and exhibit no sign of posterior collapse. Analyses of the latent space show that interpolation in the latent space is able to generate coherent sentences with smooth transition and demonstrate improved classification over strong baselines with latent features from unsupervised pretraining. These results together expose a well-structured latent space of our generative model.

[1]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[4]  John Paisley,et al.  Reweighted Expectation Maximization , 2019, ArXiv.

[5]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[6]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[7]  Xiaodong Liu,et al.  Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing , 2019, NAACL.

[8]  H. Robbins A Stochastic Approximation Method , 1951 .

[9]  Graham Neubig,et al.  Lagging Inference Networks and Posterior Collapse in Variational Autoencoders , 2019, ICLR.

[10]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[11]  Tian Han,et al.  Alternating Back-Propagation for Generator Network , 2016, AAAI.

[12]  Zhiting Hu,et al.  Improved Variational Autoencoders for Text Modeling using Dilated Convolutions , 2017, ICML.

[13]  Alexander M. Rush,et al.  Adversarially Regularized Autoencoders , 2017, ICML.

[14]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[15]  Erik Nijkamp,et al.  Learning Non-Convergent Non-Persistent Short-Run MCMC Toward Energy-Based Model , 2019, NeurIPS.

[16]  Yiming Yang,et al.  A Surprisingly Effective Fix for Deep Latent Variable Modeling of Text , 2019, EMNLP.

[17]  Regina Barzilay,et al.  Style Transfer from Non-Parallel Text by Cross-Alignment , 2017, NIPS.

[18]  Erik Nijkamp,et al.  Learning Multi-layer Latent Variable Model via Variational Optimization of Short Run MCMC for Approximate Inference , 2019, ECCV.

[19]  David Duvenaud,et al.  Inference Suboptimality in Variational Autoencoders , 2018, ICML.

[20]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[21]  Xiaodong Liu,et al.  Unified Language Model Pre-training for Natural Language Understanding and Generation , 2019, NeurIPS.

[22]  Wilker Aziz,et al.  Effective Estimation of Deep Generative Language Models , 2019, ACL.

[23]  R. Mazo On the theory of brownian motion , 1973 .

[24]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[25]  Tianqi Chen,et al.  Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[26]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[27]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.