Semi-Amortized Variational Autoencoders

Amortized variational inference (AVI) replaces instance-specific local inference with a global inference network. While AVI has enabled efficient training of deep generative models such as variational autoencoders (VAE), recent empirical work suggests that inference networks can produce suboptimal variational parameters. We propose a hybrid approach, to use AVI to initialize the variational parameters and run stochastic variational inference (SVI) to refine them. Crucially, the local SVI procedure is itself differentiable, so the inference network and generative model can be trained end-to-end with gradient-based optimization. This semi-amortized approach enables the use of rich generative models without experiencing the posterior-collapse phenomenon common in training VAEs for problems like text generation. Experiments show this approach outperforms strong autoregressive and variational baselines on standard text and image datasets.

[1]  Hedvig Kjellström,et al.  Advances in Variational Inference , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Ole Winther,et al.  Ladder Variational Autoencoders , 2016, NIPS.

[3]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[4]  Max Welling,et al.  Markov Chain Monte Carlo and Variational Inference: Bridging the Gap , 2014, ICML.

[5]  Benjamin Schrauwen,et al.  Training energy-based models for time-series imputation , 2013, J. Mach. Learn. Res..

[6]  Ryan P. Adams,et al.  Composing graphical models with neural networks for structured representations and fast inference , 2016, NIPS.

[7]  David Pfau,et al.  Unrolled Generative Adversarial Networks , 2016, ICLR.

[8]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[9]  Barak A. Pearlmutter,et al.  Automatic Learning Rate Maximization by On-Line Estimation of the Hessian's Eigenvectors , 1992, NIPS 1992.

[10]  Eric P. Xing,et al.  Nonparametric Variational Auto-Encoders for Hierarchical Representation Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Jason Tyler Rolfe,et al.  Discrete Variational Autoencoders , 2016, ICLR.

[12]  Uri Shalit,et al.  Structured Inference Networks for Nonlinear State Space Models , 2016, AAAI.

[13]  Alex Graves,et al.  Decoupled Neural Interfaces using Synthetic Gradients , 2016, ICML.

[14]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[15]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[16]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[17]  David Vázquez,et al.  PixelVAE: A Latent Variable Model for Natural Images , 2016, ICLR.

[18]  Percy Liang,et al.  Generating Sentences by Editing Prototypes , 2017, TACL.

[19]  Phil Blunsom,et al.  Discovering Discrete Latent Topics with Neural Variational Inference , 2017, ICML.

[20]  Yisong Yue,et al.  Iterative Amortized Inference , 2018, ICML.

[21]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[22]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[23]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[24]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[25]  Alex Graves,et al.  Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[26]  Juha Karhunen,et al.  A Two-Stage Pretraining Algorithm for Deep Boltzmann Machines , 2013, ICANN.

[27]  Alexander M. Rush,et al.  Structured Attention Networks , 2017, ICLR.

[28]  Sean Gerrish,et al.  Black Box Variational Inference , 2013, AISTATS.

[29]  Justin Domke,et al.  Generic Methods for Optimization-Based Modeling , 2012, AISTATS.

[30]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[31]  Chong Wang,et al.  TopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency , 2016, ICLR.

[32]  Tommi S. Jaakkola,et al.  Sequence to Better Sequence: Continuous Revision of Combinatorial Structures , 2017, ICML.

[33]  Ruslan Salakhutdinov,et al.  Accurate and conservative estimates of MRF log-likelihood using reverse annealing , 2014, AISTATS.

[34]  Yoshua Bengio,et al.  Z-Forcing: Training Stochastic Recurrent Networks , 2017, NIPS.

[35]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[36]  Yee Whye Teh,et al.  Tighter Variational Bounds are Not Necessarily Better , 2018, ICML.

[37]  Ryan P. Adams,et al.  Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.

[38]  Dustin Tran,et al.  Variational Gaussian Process , 2015, ICLR.

[39]  Hugo Larochelle,et al.  Efficient Learning of Deep Boltzmann Machines , 2010, AISTATS.

[40]  Phil Blunsom,et al.  Neural Variational Inference for Text Processing , 2015, ICML.

[41]  Eric P. Xing,et al.  Toward Controlled Generation of Text , 2017, ICML.

[42]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[43]  Nebojsa Jojic,et al.  Iterative Refinement of the Approximate Posterior for Directed Belief Networks , 2015, NIPS.

[44]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[45]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  J. Zico Kolter,et al.  OptNet: Differentiable Optimization as a Layer in Neural Networks , 2017, ICML.

[47]  Daan Wierstra,et al.  Towards Conceptual Compression , 2016, NIPS.

[48]  Joelle Pineau,et al.  A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues , 2016, AAAI.

[49]  David Duvenaud,et al.  Inference Suboptimality in Variational Autoencoders , 2018, ICML.

[50]  Zoubin Ghahramani,et al.  Propagation Algorithms for Variational Bayesian Learning , 2000, NIPS.

[51]  Zhe Gan,et al.  Topic Compositional Neural Language Model , 2017, AISTATS.

[52]  Yisong Yue,et al.  Learning to Infer , 2018, ICLR.

[53]  Erhardt Barth,et al.  A Hybrid Convolutional Variational Autoencoder for Text Generation , 2017, EMNLP.

[54]  Andrew McCallum,et al.  End-to-End Learning for Structured Prediction Energy Networks , 2017, ICML.

[55]  Karol Gregor,et al.  Neural Variational Inference and Learning in Belief Networks , 2014, ICML.

[56]  Dustin Tran,et al.  Hierarchical Variational Models , 2015, ICML.

[57]  Stefano Ermon,et al.  Towards Deeper Understanding of Variational Autoencoding Models , 2017, ArXiv.

[58]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[59]  Yoshua Bengio,et al.  A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.

[60]  Arto Klami,et al.  Importance Sampled Stochastic Optimization for Variational Inference , 2017, UAI.

[61]  Matthew D. Hoffman,et al.  On the challenges of learning with inference networks on sparse, high-dimensional data , 2017, AISTATS.

[62]  Barak A. Pearlmutter Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[63]  Veselin Stoyanov,et al.  Empirical Risk Minimization of Graphical Model Parameters Given Approximate Inference, Decoding, and Model Structure , 2011, AISTATS.

[64]  Max Welling,et al.  VAE with a VampPrior , 2017, AISTATS.

[65]  Zhe Gan,et al.  VAE Learning via Stein Variational Gradient Descent , 2017, NIPS.

[66]  Ole Winther,et al.  Sequential Neural Models with Stochastic Layers , 2016, NIPS.

[67]  Pieter Abbeel,et al.  Variational Lossy Autoencoder , 2016, ICLR.