Structured Generative Adversarial Networks

We study the problem of conditional generative modeling based on designated semantics or structures. Existing models that build conditional generators either require massive labeled instances as supervision or are unable to accurately control the semantics of generated samples. We propose structured generative adversarial networks (SGANs) for semi-supervised conditional generative modeling. SGAN assumes the data x is generated conditioned on two independent latent variables: y that encodes the designated semantics, and z that contains other factors of variation. To ensure disentangled semantics in y and z, SGAN builds two collaborative games in the hidden space to minimize the reconstruction error of y and z, respectively. Training SGAN also involves solving two adversarial games that have their equilibrium concentrating at the true joint data distributions p(x, z) and p(x, y), avoiding distributing the probability mass diffusely over data space that MLE-based methods may suffer. We assess SGAN by evaluating its trained networks, and its performance on downstream tasks. We show that SGAN delivers a highly controllable generator, and disentangled representations; it also establishes start-of-the-art results across multiple datasets when applied for semi-supervised image classification (1.27%, 5.73%, 17.26% error rates on MNIST, SVHN and CIFAR-10 using 50, 1000 and 4000 labels, respectively). Benefiting from the separate modeling of y and z, SGAN can generate images with high visual quality and strictly following the designated semantic, and can be extended to a wide spectrum of applications, such as style transfer.

[1]  Timo Aila,et al.  Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[2]  Bo Zhang,et al.  Max-Margin Deep Generative Models , 2015, NIPS.

[3]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[4]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[5]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[6]  Honglak Lee,et al.  Attribute2Image: Conditional Image Generation from Visual Attributes , 2015, ECCV.

[7]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[8]  Aaron C. Courville,et al.  Adversarially Learned Inference , 2016, ICLR.

[9]  Bernt Schiele,et al.  Learning What and Where to Draw , 2016, NIPS.

[10]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[11]  Razvan Pascanu,et al.  Theano: A CPU and GPU Math Compiler in Python , 2010, SciPy.

[12]  Eric P. Xing,et al.  Controllable Text Generation , 2017, ArXiv.

[13]  Jun Zhu,et al.  Triple Generative Adversarial Nets , 2017, NIPS.

[14]  Xiaoming Liu,et al.  Disentangled Representation Learning GAN for Pose-Invariant Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Abhinav Gupta,et al.  Generative Image Modeling Using Style and Structure Adversarial Networks , 2016, ECCV.

[16]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[17]  Leon A. Gatys,et al.  A Neural Algorithm of Artistic Style , 2015, ArXiv.

[18]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Pengtao Xie,et al.  Poseidon: A System Architecture for Efficient GPU-based Deep Learning on Multiple Machines , 2015, ArXiv.

[20]  Eric P. Xing,et al.  ZM-Net: Real-time Zero-shot Image Manipulation Network , 2017, ArXiv.

[21]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[22]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[23]  Nando de Freitas,et al.  Generating Interpretable Images with Controllable Structure , 2017 .

[24]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[25]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[26]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[27]  Yu-Chiang Frank Wang,et al.  Learning Cross-Domain Disentangled Deep Representation with Supervision from A Single Domain , 2017, ArXiv.

[28]  Jost Tobias Springenberg,et al.  Unsupervised and Semi-supervised Learning with Categorical Generative Adversarial Networks , 2015, ICLR.

[29]  Ole Winther,et al.  Auxiliary Deep Generative Models , 2016, ICML.

[30]  Trevor Darrell,et al.  Adversarial Feature Learning , 2016, ICLR.

[31]  Eric P. Xing,et al.  Toward Controlled Generation of Text , 2017, ICML.

[32]  Tapani Raiko,et al.  Semi-supervised Learning with Ladder Networks , 2015, NIPS.

[33]  Chuang Gan,et al.  Recurrent Topic-Transition GAN for Visual Paragraph Generation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[34]  Sheng-De Wang,et al.  Detach and Adapt: Learning Cross-Domain Disentangled Deep Representation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.