Training Variational Autoencoders with Discrete Latent Variables Using Importance Sampling

The Variational Autoencoder (VAE) is a popular generative latent variable model that is often used for representation learning. Standard VAEs assume continuous-valued latent variables and are trained by maximization of the evidence lower bound (ELBO). Conventional methods obtain a differentiable estimate of the ELBO with reparametrized sampling and optimize it with Stochastic Gradient Descend (SGD). However, this is not possible if we want to train VAEs with discrete-valued latent variables, since reparametrized sampling is not possible. In this paper, we propose an easy method to train VAEs with binary or categorically valued latent representations. Therefore, we use a differentiable estimator for the ELBO which is based on importance sampling. In experiments, we verify the approach and train two different VAEs architectures with Bernoulli and categorically distributed latent representations on two different benchmark datasets.

[1]  Huachun Tan,et al.  Variational Deep Embedding: A Generative Approach to Clustering , 2016, ArXiv.

[2]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[3]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[4]  Sungzoon Cho,et al.  Variational Autoencoder based Anomaly Detection using Reconstruction Probability , 2015 .

[5]  Tom White,et al.  Sampling Generative Networks: Notes on a Few Effective Techniques , 2016, ArXiv.

[6]  Max Welling,et al.  Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[7]  Murray Shanahan,et al.  Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders , 2016, ArXiv.

[8]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[9]  Yang Feng,et al.  Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications , 2018, WWW.

[10]  David Duvenaud,et al.  Backpropagation through the Void: Optimizing control variates for black-box gradient estimation , 2017, ICLR.

[11]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[12]  Jiwen Lu,et al.  Deep hashing for compact binary codes learning , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[14]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[15]  Oriol Vinyals,et al.  Neural Discrete Representation Learning , 2017, NIPS.

[16]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.