Continuous Relaxation Training of Discrete Latent Variable Image Models

Despite recent improvements in training methodology, discrete latent variable models have failed to achieve the performance and popularity of their continuous counterparts. Here, we evaluate several approaches to training large-scale image models on CIFAR-10 using a probabilistic variant of the recently proposed Vector Quantized VAE architecture. We find that biased estimators such as continuous relaxations provide reliable methods for training these models while unbiased score-function-based estimators like VIMCO struggle in high-dimensional discrete spaces. Furthermore, we observe that the learned discrete codes lie on low-dimensional manifolds, indicating that discrete latent variables can learn to represent continuous latent quantities. Our findings show that continuous relaxation training of discrete latent variable models is a powerful method for learning representations that can flexibly capture both continuous and discrete aspects of natural data.