LaDDer: Latent Data Distribution Modelling with a Generative Prior

In this paper, we show that the performance of a learnt generative model is closely related to the model's ability to accurately represent the inferred \textbf{latent data distribution}, i.e. its topology and structural properties. We propose LaDDer to achieve accurate modelling of the latent data distribution in a variational autoencoder framework and to facilitate better representation learning. The central idea of LaDDer is a meta-embedding concept, which uses multiple VAE models to learn an embedding of the embeddings, forming a ladder of encodings. We use a non-parametric mixture as the hyper prior for the innermost VAE and learn all the parameters in a unified variational framework. From extensive experiments, we show that our LaDDer model is able to accurately estimate complex latent distribution and results in improvement in the representation quality. We also propose a novel latent space interpolation method that utilises the derived data distribution.

[1]  Max Welling,et al.  Improving Variational Auto-Encoders using Householder Flow , 2016, ArXiv.

[2]  Andriy Mnih,et al.  Disentangling by Factorising , 2018, ICML.

[3]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[4]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[5]  Eric P. Xing,et al.  Nonparametric Variational Auto-Encoders for Hierarchical Representation Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Marco Cote STICK-BREAKING VARIATIONAL AUTOENCODERS , 2017 .

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Leonidas J. Guibas,et al.  Composite Shape Modeling via Latent Space Factorization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  Ali Razavi,et al.  Generating Diverse High-Fidelity Images with VQ-VAE-2 , 2019, NeurIPS.

[10]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[11]  Guoying Zhao,et al.  Bidirectional Long Short-Term Memory Variational Autoencoder , 2018, BMVC.

[12]  Agathoniki Trigoni,et al.  Balancing Reconstruction Quality and Regularisation in ELBO for VAEs , 2019, 1909.03765.

[13]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[14]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Max Welling,et al.  VAE with a VampPrior , 2017, AISTATS.

[16]  David Berthelot,et al.  Understanding and Improving Interpolation in Autoencoders via an Adversarial Regularizer , 2018, ICLR.

[17]  Angela Yao,et al.  Disentangling Latent Hands for Image Synthesis and Pose Estimation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Alex Graves,et al.  Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[19]  Dustin Tran,et al.  Hierarchical Variational Models , 2015, ICML.

[20]  Max Welling,et al.  Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[21]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[22]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[23]  Andrew J. Davison,et al.  DeepFactors: Real-Time Probabilistic Dense Monocular SLAM , 2020, IEEE Robotics and Automation Letters.

[24]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[25]  Taku Komura,et al.  A Recurrent Variational Autoencoder for Human Motion Synthesis , 2017, BMVC.

[26]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[27]  Jitendra Malik,et al.  Non-Adversarial Image Synthesis With Generative Latent Nearest Neighbors , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Murray Shanahan,et al.  Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders , 2016, ArXiv.

[29]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[30]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[31]  Sebastian Nowozin,et al.  Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Sven J. Dickinson,et al.  Geometric Disentanglement for Generative Latent Shape Models , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[34]  Prafulla Dhariwal,et al.  Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[35]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[36]  Patrick van der Smagt,et al.  Learning Hierarchical Priors in VAEs , 2019, NeurIPS.

[37]  Hugo Larochelle,et al.  Neural Autoregressive Distribution Estimation , 2016, J. Mach. Learn. Res..

[38]  Ryan P. Adams,et al.  Composing graphical models with neural networks for structured representations and fast inference , 2016, NIPS.

[39]  Bjorn Ommer,et al.  Unsupervised Robust Disentangling of Latent Characteristics for Image Synthesis , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[40]  Stefan Leutenegger,et al.  CodeSLAM - Learning a Compact, Optimisable Representation for Dense Visual SLAM , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.