Avoiding Latent Variable Collapse With Generative Skip Models

Variational autoencoders learn distributions of high-dimensional data. They model data with a deep latent-variable model and then fit the model by maximizing a lower bound of the log marginal likelihood. VAEs can capture complex distributions, but they can also suffer from an issue known as "latent variable collapse," especially if the likelihood model is powerful. Specifically, the lower bound involves an approximate posterior of the latent variables; this posterior "collapses" when it is set equal to the prior, i.e., when the approximate posterior is independent of the data. While VAEs learn good generative models, latent variable collapse prevents them from learning useful representations. In this paper, we propose a simple new way to avoid latent variable collapse by including skip connections in our generative model; these connections enforce strong links between the latent variables and the likelihood function. We study generative skip models both theoretically and empirically. Theoretically, we prove that skip models increase the mutual information between the observations and the inferred latent variables. Empirically, we study images (MNIST and Omniglot) and text (Yahoo). Compared to existing VAE architectures, we show that generative skip models maintain similar predictive performance but lead to less collapse and provide more meaningful representations of the data.

[1]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[2]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[3]  Chong Wang,et al.  TopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency , 2016, ICLR.

[4]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[5]  Kunihiko Fukushima,et al.  Neocognitron: A hierarchical neural network capable of visual pattern recognition , 1988, Neural Networks.

[6]  Max Welling,et al.  VAE with a VampPrior , 2017, AISTATS.

[7]  David Vázquez,et al.  PixelVAE: A Latent Variable Model for Natural Images , 2016, ICLR.

[8]  Matthew D. Hoffman,et al.  Variational Autoencoders for Collaborative Filtering , 2018, WWW.

[9]  Max Welling,et al.  Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[10]  Pieter Abbeel,et al.  Variational Lossy Autoencoder , 2016, ICLR.

[11]  Percy Liang,et al.  Generating Sentences by Editing Prototypes , 2017, TACL.

[12]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[13]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[14]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Ole Winther,et al.  How to Train Deep Variational Autoencoders and Probabilistic Ladder Networks , 2016, ICML 2016.

[16]  Maxine Eskénazi,et al.  Unsupervised Discrete Sentence Representation Learning for Interpretable Neural Dialog Generation , 2018, ACL.

[17]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[18]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[19]  Alexander M. Rush,et al.  Semi-Amortized Variational Autoencoders , 2018, ICML.

[20]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[21]  Oriol Vinyals,et al.  Neural Discrete Representation Learning , 2017, NIPS.

[22]  Li Fei-Fei,et al.  Tackling Over-pruning in Variational Autoencoders , 2017, ArXiv.

[23]  Phil Blunsom,et al.  Neural Variational Inference for Text Processing , 2015, ICML.

[24]  Alex Graves,et al.  Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[25]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[26]  Matthew D. Hoffman,et al.  Learning Deep Latent Gaussian Models with Markov Chain Monte Carlo , 2017, ICML.

[27]  Alexander A. Alemi,et al.  Fixing a Broken ELBO , 2017, ICML.

[28]  Zhiting Hu,et al.  Improved Variational Autoencoders for Text Modeling using Dilated Convolutions , 2017, ICML.

[29]  Sebastian Nowozin,et al.  Adversarial Variational Bayes: Unifying Variational Autoencoders and Generative Adversarial Networks , 2017, ICML.

[30]  Stefano Ermon,et al.  Towards Deeper Understanding of Variational Autoencoding Models , 2017, ArXiv.

[31]  Jiacheng Xu,et al.  Spherical Latent Spaces for Stable Variational Autoencoders , 2018, EMNLP.

[32]  Ole Winther,et al.  Ladder Variational Autoencoders , 2016, NIPS.

[33]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.