Multi-Level Variational Autoencoder: Learning Disentangled Representations from Grouped Observations

We would like to learn a representation of the data which decomposes an observation into factors of variation which we can independently control. Specifically, we want to use minimal supervision to learn a latent representation that reflects the semantics behind a specific grouping of the data, where within a group the samples share a common factor of variation. For example, consider a collection of face images grouped by identity. We wish to anchor the semantics of the grouping into a relevant and disentangled representation that we can easily exploit. However, existing deep probabilistic models often assume that the observations are independent and identically distributed. We present the Multi-Level Variational Autoencoder (ML-VAE), a new deep probabilistic model for learning a disentangled representation of a set of grouped observations. The ML-VAE separates the latent representation into semantically meaningful parts by working both at the group level and the observation level, while retaining efficient test-time inference. Quantitative and qualitative evaluations show that the ML-VAE model (i) learns a semantically meaningful disentanglement of grouped data, (ii) enables manipulation of the latent representation, and (iii) generalises to unseen groups.

[1]  Ralph Linsker,et al.  Self-organization in a perceptual network , 1988, Computer.

[2]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[3]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[5]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[6]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[7]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[8]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[9]  Zoubin Ghahramani,et al.  Training generative neural networks via Maximum Mean Discrepancy optimization , 2015, UAI.

[10]  Joshua B. Tenenbaum,et al.  Deep Convolutional Inverse Graphics Network , 2015, NIPS.

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Serge J. Belongie,et al.  Bayesian representation learning with oracle constraints , 2015, ICLR 2016.

[13]  Richard S. Zemel,et al.  Generative Moment Matching Networks , 2015, ICML.

[14]  Navdeep Jaitly,et al.  Adversarial Autoencoders , 2015, ArXiv.

[15]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[16]  Yuxiao Hu,et al.  MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition , 2016, ECCV.

[17]  Yann LeCun,et al.  Disentangling factors of variation in deep representation using adversarial training , 2016, NIPS.

[18]  Amos J. Storkey,et al.  Censoring Representations with an Adversary , 2015, ICLR.

[19]  Abhinav Gupta,et al.  Generative Image Modeling Using Style and Structure Adversarial Networks , 2016, ECCV.

[20]  Ryan P. Adams,et al.  Composing graphical models with neural networks for structured representations and fast inference , 2016, NIPS.

[21]  Max Welling,et al.  The Variational Fair Autoencoder , 2015, ICLR.

[22]  Lior Wolf,et al.  Unsupervised Cross-Domain Image Generation , 2016, ICLR.

[23]  Frank D. Wood,et al.  Learning Disentangled Representations with Semi-Supervised Deep Generative Models , 2017, NIPS.

[24]  Vighnesh Birodkar,et al.  Unsupervised Learning of Disentangled Representations from Video , 2017, NIPS.

[25]  Anton van den Hengel,et al.  Infinite Variational Autoencoder for Semi-Supervised Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Dumitru Erhan,et al.  Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Yu-Chiang Frank Wang,et al.  Learning Cross-Domain Disentangled Deep Representation with Supervision from A Single Domain , 2017, ArXiv.

[28]  Amos J. Storkey,et al.  Towards a Neural Statistician , 2016, ICLR.

[29]  Pushmeet Kohli,et al.  Learning Continuous Semantic Representations of Symbolic Expressions , 2016, ICML.

[30]  Pieter Abbeel,et al.  Variational Lossy Autoencoder , 2016, ICLR.

[31]  Tomas Pfister,et al.  Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Serge J. Belongie,et al.  Conditional Similarity Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Hyunsoo Kim,et al.  Learning to Discover Cross-Domain Relations with Generative Adversarial Networks , 2017, ICML.

[34]  Alexander A. Alemi,et al.  Deep Variational Information Bottleneck , 2017, ICLR.

[35]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[36]  Jan Kautz,et al.  Unsupervised Image-to-Image Translation Networks , 2017, NIPS.

[37]  Pushmeet Kohli,et al.  Learning Disentangled Representations in Deep Generative Models , 2017 .

[38]  Ning Chen,et al.  Learning Attributes from the Crowdsourced Relative Labels , 2017, AAAI.

[39]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[40]  Ping Tan,et al.  DualGAN: Unsupervised Dual Learning for Image-to-Image Translation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[41]  Swarat Chaudhuri,et al.  Bayesian Sketch Learning for Program Synthesis , 2017, ArXiv.

[42]  Chris Donahue,et al.  Semantically Decomposing the Latent Spaces of Generative Adversarial Networks , 2017, ICLR.

[43]  Harshad Rai,et al.  Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks , 2018 .

[44]  Jan Kautz,et al.  MoCoGAN: Decomposing Motion and Content for Video Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.