Recursive Inference for Variational Autoencoders

Inference networks of traditional Variational Autoencoders (VAEs) are typically amortized, resulting in relatively inaccurate posterior approximation compared to instance-wise variational optimization. Recent semi-amortized approaches were proposed to address this drawback; however, their iterative gradient update procedures can be computationally demanding. To address these issues, in this paper we introduce an accurate amortized inference algorithm. We propose a novel recursive mixture estimation algorithm for VAEs that iteratively augments the current mixture with new components so as to maximally reduce the divergence between the variational and the true posteriors. Using the functional gradient approach, we devise an intuitive learning criteria for selecting a new mixture component: the new component has to improve the data likelihood (lower bound) and, at the same time, be as divergent from the current mixture distribution as possible, thus increasing representational diversity. Compared to recently proposed boosted variational inference (BVI), our method relies on amortized inference in contrast to BVI's non-amortized single optimization instance. A crucial benefit of our approach is that the inference at test time requires a single feed-forward pass through the mixture inference network, making it significantly faster than the semi-amortized approaches. We show that our approach yields higher test data likelihood than the state-of-the-art on several benchmark datasets.

[1]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[2]  Joshua B. Tenenbaum,et al.  One-shot learning by inverting a compositional causal process , 2013, NIPS.

[3]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[4]  Peter L. Bartlett,et al.  Functional Gradient Techniques for Combining Hypotheses , 2000 .

[5]  Gunnar Rätsch,et al.  Boosting Variational Inference: an Optimization Perspective , 2017, AISTATS.

[6]  Max Welling,et al.  VAE with a VampPrior , 2017, AISTATS.

[7]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[8]  Gunnar Rätsch,et al.  Boosting Black Box Variational Inference , 2018, NeurIPS.

[9]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[10]  Mingyuan Zhou,et al.  Semi-Implicit Variational Inference , 2018, ICML.

[11]  O. Zobay Variational Bayesian inference with Gaussian-mixture approximations , 2014 .

[12]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[13]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[14]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[15]  Evgeny Burnaev,et al.  BooVAE: A scalable framework for continual VAE learning under boosting approach , 2019, ArXiv.

[16]  Trevor Campbell,et al.  Universal Boosting Variational Inference , 2019, NeurIPS.

[17]  Gunhee Kim,et al.  Variational Laplace Autoencoders , 2019, ICML.

[18]  Xiangyu Wang,et al.  Boosting Variational Inference , 2016, ArXiv.

[19]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[20]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[21]  Max Welling,et al.  Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[22]  Ryan P. Adams,et al.  Variational Boosting: Iteratively Refining Posterior Approximations , 2016, ICML.

[23]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[24]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  M. Friedman Greedy Fun tion Approximation : A Gradient Boosting , 1999 .

[26]  David Duvenaud,et al.  Inference Suboptimality in Variational Autoencoders , 2018, ICML.

[27]  Matthew D. Hoffman,et al.  On the challenges of learning with inference networks on sparse, high-dimensional data , 2017, AISTATS.

[28]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[29]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[30]  Peter W. Glynn,et al.  Probability Functional Descent: A Unifying Perspective on GANs, Variational Inference, and Reinforcement Learning , 2019, ICML.

[31]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[32]  Alexander M. Rush,et al.  Semi-Amortized Variational Autoencoders , 2018, ICML.

[33]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[34]  Max Welling,et al.  Improving Variational Auto-Encoders using Householder Flow , 2016, ArXiv.

[35]  Yisong Yue,et al.  Iterative Amortized Inference , 2018, ICML.