论文信息 - Multi-Level Variational Autoencoder: Learning Disentangled Representations from Grouped Observations

Multi-Level Variational Autoencoder: Learning Disentangled Representations from Grouped Observations

We would like to learn a representation of the data which decomposes an observation into factors of variation which we can independently control. Specifically, we want to use minimal supervision to learn a latent representation that reflects the semantics behind a specific grouping of the data, where within a group the samples share a common factor of variation. For example, consider a collection of face images grouped by identity. We wish to anchor the semantics of the grouping into a relevant and disentangled representation that we can easily exploit. However, existing deep probabilistic models often assume that the observations are independent and identically distributed. We present the Multi-Level Variational Autoencoder (ML-VAE), a new deep probabilistic model for learning a disentangled representation of a set of grouped observations. The ML-VAE separates the latent representation into semantically meaningful parts by working both at the group level and the observation level, while retaining efficient test-time inference. Quantitative and qualitative evaluations show that the ML-VAE model (i) learns a semantically meaningful disentanglement of grouped data, (ii) enables manipulation of the latent representation, and (iii) generalises to unseen groups.

[1] Ralph Linsker,et al. Self-organization in a perceptual network , 1988, Computer.

[2] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[3] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4] Chong Wang,et al. Stochastic variational inference , 2012, J. Mach. Learn. Res..

[5] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[6] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[7] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[8] Max Welling,et al. Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[9] Zoubin Ghahramani,et al. Training generative neural networks via Maximum Mean Discrepancy optimization , 2015, UAI.

[10] Joshua B. Tenenbaum,et al. Deep Convolutional Inverse Graphics Network , 2015, NIPS.

[11] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12] Serge J. Belongie,et al. Bayesian representation learning with oracle constraints , 2015, ICLR 2016.

[13] Richard S. Zemel,et al. Generative Moment Matching Networks , 2015, ICML.

[14] Navdeep Jaitly,et al. Adversarial Autoencoders , 2015, ArXiv.

[15] Pieter Abbeel,et al. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[16] Yuxiao Hu,et al. MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition , 2016, ECCV.

[17] Yann LeCun,et al. Disentangling factors of variation in deep representation using adversarial training , 2016, NIPS.

[18] Amos J. Storkey,et al. Censoring Representations with an Adversary , 2015, ICLR.

[19] Abhinav Gupta,et al. Generative Image Modeling Using Style and Structure Adversarial Networks , 2016, ECCV.

[20] Ryan P. Adams,et al. Composing graphical models with neural networks for structured representations and fast inference , 2016, NIPS.

[21] Max Welling,et al. The Variational Fair Autoencoder , 2015, ICLR.

[22] Lior Wolf,et al. Unsupervised Cross-Domain Image Generation , 2016, ICLR.

[23] Frank D. Wood,et al. Learning Disentangled Representations with Semi-Supervised Deep Generative Models , 2017, NIPS.

[24] Vighnesh Birodkar,et al. Unsupervised Learning of Disentangled Representations from Video , 2017, NIPS.

[25] Anton van den Hengel,et al. Infinite Variational Autoencoder for Semi-Supervised Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Dumitru Erhan,et al. Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Yu-Chiang Frank Wang,et al. Learning Cross-Domain Disentangled Deep Representation with Supervision from A Single Domain , 2017, ArXiv.

[28] Amos J. Storkey,et al. Towards a Neural Statistician , 2016, ICLR.

[29] Pushmeet Kohli,et al. Learning Continuous Semantic Representations of Symbolic Expressions , 2016, ICML.

[30] Pieter Abbeel,et al. Variational Lossy Autoencoder , 2016, ICLR.

[31] Tomas Pfister,et al. Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Serge J. Belongie,et al. Conditional Similarity Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Hyunsoo Kim,et al. Learning to Discover Cross-Domain Relations with Generative Adversarial Networks , 2017, ICML.

[34] Alexander A. Alemi,et al. Deep Variational Information Bottleneck , 2017, ICLR.

[35] Christopher Burgess,et al. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[36] Jan Kautz,et al. Unsupervised Image-to-Image Translation Networks , 2017, NIPS.

[37] Pushmeet Kohli,et al. Learning Disentangled Representations in Deep Generative Models , 2017 .

[38] Ning Chen,et al. Learning Attributes from the Crowdsourced Relative Labels , 2017, AAAI.

[39] 拓海杉山,et al. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[40] Ping Tan,et al. DualGAN: Unsupervised Dual Learning for Image-to-Image Translation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[41] Swarat Chaudhuri,et al. Bayesian Sketch Learning for Program Synthesis , 2017, ArXiv.

[42] Chris Donahue,et al. Semantically Decomposing the Latent Spaces of Generative Adversarial Networks , 2017, ICLR.

[43] Harshad Rai,et al. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks , 2018 .

[44] Jan Kautz,et al. MoCoGAN: Decomposing Motion and Content for Video Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.