论文信息 - Hybrid VAE: Improving Deep Generative Models using Partial Observations

Hybrid VAE: Improving Deep Generative Models using Partial Observations

Deep neural network models trained on large labeled datasets are the state-of-the-art in a large variety of computer vision tasks. In many applications, however, labeled data is expensive to obtain or requires a time consuming manual annotation process. In contrast, unlabeled data is often abundant and available in large quantities. We present a principled framework to capitalize on unlabeled data by training deep generative models on both labeled and unlabeled data. We show that such a combination is beneficial because the unlabeled data acts as a data-driven form of regularization, allowing generative models trained on few labeled samples to reach the performance of fully-supervised generative models trained on much larger datasets. We call our method Hybrid VAE (H-VAE) as it contains both the generative and the discriminative parts. We validate H-VAE on three large-scale datasets of different modalities: two face datasets: (MultiPIE, CelebA) and a hand pose dataset (NYU Hand Pose). Our qualitative visualizations further support improvements achieved by using partial observations.

[1] Yann LeCun,et al. The mnist database of handwritten digits , 2005 .

[2] Tom Minka,et al. Principled Hybrids of Generative and Discriminative Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[3] Andrew W. Fitzgibbon,et al. The Joint Manifold Model for Semi-supervised Multi-valued Regression , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[4] A. Raftery,et al. Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[5] Christopher Joseph Pal,et al. Semi-supervised classification with hybrid generative/discriminative methods , 2007, KDD '07.

[6] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[7] Andrew Y. Ng,et al. Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[8] Deva Ramanan,et al. Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9] Fernando De la Torre,et al. Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[10] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[11] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[12] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[13] Max Welling,et al. Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[14] Ken Perlin,et al. Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks , 2014, ACM Trans. Graph..

[15] Rob Fergus,et al. Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[16] Xiaogang Wang,et al. Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[17] Kenta Oono,et al. Chainer : a Next-Generation Open Source Framework for Deep Learning , 2015 .

[18] Nicu Sebe,et al. Regressing a 3D Face Shape from a Single Image , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19] Alex Graves,et al. DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[20] Vincent Lepetit,et al. Hands Deep in Deep Learning for Hand Pose Estimation , 2015, ArXiv.

[21] Carl Doersch,et al. Tutorial on Variational Autoencoders , 2016, ArXiv.

[22] Wojciech Zaremba,et al. Improved Techniques for Training GANs , 2016, NIPS.

[23] Bernt Schiele,et al. Generative Adversarial Text to Image Synthesis , 2016, ICML.

[24] Ole Winther,et al. Auxiliary Deep Generative Models , 2016, ICML.

[25] Jiajun Wu,et al. Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[26] Soumith Chintala,et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[27] George Trigeorgis,et al. Mnemonic Descent Method: A Recurrent Process Applied for End-to-End Face Alignment , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Aaron C. Courville,et al. Discriminative Regularization for Generative Models , 2016, ArXiv.

[29] Andrea Vedaldi,et al. Texture Networks: Feed-forward Synthesis of Textures and Stylized Images , 2016, ICML.

[30] Antonio Torralba,et al. Generating Videos with Scene Dynamics , 2016, NIPS.

[31] Sebastian Nowozin,et al. f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.

[32] Lucas Theis,et al. Amortised MAP Inference for Image Super-resolution , 2016, ICLR.

[33] David Pfau,et al. Unrolled Generative Adversarial Networks , 2016, ICLR.

[34] Max Welling,et al. Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[35] Jan Kautz,et al. MoCoGAN: Decomposing Motion and Content for Video Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36] Nicu Sebe,et al. Viewpoint-Consistent 3D Face Alignment , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.