NVAE: A Deep Hierarchical Variational Autoencoder

Normalizing flows, autoregressive models, variational autoencoders (VAEs), and deep energy-based models are among competing likelihood-based frameworks for deep generative learning. Among them, VAEs have the advantage of fast and tractable sampling and easy-to-access encoding networks. However, they are currently outperformed by other models such as normalizing flows and autoregressive models. While the majority of the research in VAEs is focused on the statistical challenges, we explore the orthogonal direction of carefully designing neural architectures for hierarchical VAEs. We propose Nouveau VAE (NVAE), a deep hierarchical VAE built for image generation using depth-wise separable convolutions and batch normalization. NVAE is equipped with a residual parameterization of Normal distributions and its training is stabilized by spectral regularization. We show that NVAE achieves state-of-the-art results among non-autoregressive likelihood-based models on the MNIST, CIFAR-10, CelebA 64, and CelebA HQ datasets and it provides a strong baseline on FFHQ. For example, on CIFAR-10, NVAE pushes the state-of-the-art from 2.98 to 2.91 bits per dimension, and it produces high-quality images on CelebA HQ. To the best of our knowledge, NVAE is the first successful VAE applied to natural images as large as 256$\times$256 pixels. The source code is available at this https URL .

[1]  Pieter Abbeel,et al.  PixelSNAIL: An Improved Autoregressive Generative Model , 2017, ICML.

[2]  Max Welling,et al.  VAE with a VampPrior , 2017, AISTATS.

[3]  Dustin Tran,et al.  Hierarchical Variational Models , 2015, ICML.

[4]  Max Welling,et al.  Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[5]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[6]  Pieter Abbeel,et al.  Variational Lossy Autoencoder , 2016, ICLR.

[7]  Tian Tian,et al.  VFlow: More Expressive Generative Flows with Variational Data Augmentation , 2020, ICML.

[8]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[9]  Quoc V. Le,et al.  Swish: a Self-Gated Activation Function , 2017, 1710.05941.

[10]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[11]  Eric Horvitz,et al.  Bias Correction of Learned Generative Models using Likelihood-Free Importance Weighting , 2019, DGS@ICLR.

[12]  Arash Vahdat,et al.  Undirected Graphical Models as Approximate Posteriors , 2019, ICML.

[13]  Yuichi Yoshida,et al.  Spectral Norm Regularization for Improving the Generalizability of Deep Learning , 2017, ArXiv.

[14]  Ole Winther,et al.  Auxiliary Deep Generative Models , 2016, ICML.

[15]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[16]  Ilya Sutskever,et al.  Training Deep and Recurrent Networks with Hessian-Free Optimization , 2012, Neural Networks: Tricks of the Trade.

[17]  Tianqi Chen,et al.  Training Deep Nets with Sublinear Memory Cost , 2016, ArXiv.

[18]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[19]  David Barber,et al.  Information Maximization in Noisy Channels : A Variational Approach , 2003, NIPS.

[20]  Naftali Tishby,et al.  Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.

[21]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[22]  Ali Razavi,et al.  Preventing Posterior Collapse with delta-VAEs , 2019, ICLR.

[23]  Eduard H. Hovy,et al.  MaCow: Masked Convolutional Generative Flow , 2019, NeurIPS.

[24]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  Nal Kalchbrenner,et al.  Generating High Fidelity Images with Subscale Pixel Networks and Multidimensional Upscaling , 2018, ICLR.

[26]  David Duvenaud,et al.  Sticking the Landing: Simple, Lower-Variance Gradient Estimators for Variational Inference , 2017, NIPS.

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[29]  Pieter Abbeel,et al.  Flow++: Improving Flow-Based Generative Models with Variational Dequantization and Architecture Design , 2019, ICML.

[30]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[31]  Jason Tyler Rolfe,et al.  Discrete Variational Autoencoders , 2016, ICLR.

[32]  Ole Winther,et al.  BIVA: A Very Deep Hierarchy of Latent Variables for Generative Modeling , 2019, NeurIPS.

[33]  Quoc V. Le,et al.  Searching for Activation Functions , 2018, arXiv.

[34]  Frank D. Wood,et al.  The Thermodynamic Variational Objective , 2019, NeurIPS.

[35]  Ruslan Salakhutdinov,et al.  On the Quantitative Analysis of Decoder-Based Generative Models , 2016, ICLR.

[36]  Arash Vahdat,et al.  DVAE#: Discrete Variational Autoencoders with Relaxed Boltzmann Priors , 2018, NeurIPS.

[37]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  David Duvenaud,et al.  Backpropagation through the Void: Optimizing control variates for black-box gradient estimation , 2017, ICLR.

[39]  Ole Winther,et al.  Autoencoding beyond pixels using a learned similarity metric , 2015, ICML.

[40]  Arash Vahdat,et al.  DVAE++: Discrete Variational Autoencoders with Overlapping Transformations , 2018, ICML.

[41]  Karen Simonyan,et al.  The challenge of realistic music generation: modelling raw audio at scale , 2018, NeurIPS.

[42]  Ole Winther,et al.  Ladder Variational Autoencoders , 2016, NIPS.

[43]  Yisong Yue,et al.  Iterative Amortized Inference , 2018, ICML.

[44]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[45]  Jascha Sohl-Dickstein,et al.  REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models , 2017, NIPS.

[46]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[47]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[48]  George Tucker,et al.  Doubly Reparameterized Gradient Estimators for Monte Carlo Objectives , 2019, ICLR.

[49]  Xi Chen,et al.  PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications , 2017, ICLR.

[50]  Patrick van der Smagt,et al.  Learning Hierarchical Priors in VAEs , 2019, NeurIPS.

[51]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[52]  Tim Salimans,et al.  Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.

[53]  Daan Wierstra,et al.  Towards Conceptual Compression , 2016, NIPS.

[54]  David Duvenaud,et al.  Inference Suboptimality in Variational Autoencoders , 2018, ICML.

[55]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[56]  Ali Razavi,et al.  Generating Diverse High-Fidelity Images with VQ-VAE-2 , 2019, NeurIPS.

[57]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[58]  Walter Vinci,et al.  PixelVAE++: Improved PixelVAE with Discrete Prior , 2019, ArXiv.

[59]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[60]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[61]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[62]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Mohammad Norouzi,et al.  Don't Blame the ELBO! A Linear VAE Perspective on Posterior Collapse , 2019, NeurIPS.

[64]  Prafulla Dhariwal,et al.  Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[65]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[66]  Dustin Tran,et al.  Image Transformer , 2018, ICML.

[67]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[69]  Eduard H. Hovy,et al.  MAE: Mutual Posterior-Divergence Regularization for Variational AutoEncoders , 2019, ICLR.

[70]  Aaron C. Courville,et al.  Augmented Normalizing Flows: Bridging the Gap Between Generative Flows and Latent Variable Models , 2020, ArXiv.

[71]  Alexander A. Alemi,et al.  Deep Variational Information Bottleneck , 2017, ICLR.

[72]  Yoshua Bengio,et al.  Bidirectional Helmholtz Machines , 2015, ICML.

[73]  Alex Graves,et al.  Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[74]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[75]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[76]  Koray Kavukcuoglu,et al.  Pixel Recurrent Neural Networks , 2016, ICML.

[77]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[78]  Richard E. Turner,et al.  Rényi Divergence Variational Inference , 2016, NIPS.

[79]  David Vázquez,et al.  PixelVAE: A Latent Variable Model for Natural Images , 2016, ICLR.

[80]  David Duvenaud,et al.  Residual Flows for Invertible Generative Modeling , 2019, NeurIPS.