Towards A Controllable Disentanglement Network

This article addresses two crucial problems of learning disentangled image representations, namely, controlling the degree of disentanglement during image editing, and balancing the disentanglement strength and the reconstruction quality. To encourage disentanglement, we devise distance covariance-based decorrelation regularization. Further, for the reconstruction step, our model leverages a soft target representation combined with the latent image code. By exploring the real-valued space of the soft target representation, we are able to synthesize novel images with the designated properties. To improve the perceptual quality of images generated by autoencoder (AE)-based models, we extend the encoder-decoder architecture with the generative adversarial network (GAN) by collapsing the AE decoder and the GAN generator into one. We also design a classification-based protocol to quantitatively evaluate the disentanglement strength of our model. The experimental results showcase the benefits of the proposed model.

[1]  Luc Van Gool,et al.  Disentangled Person Image Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[3]  Christopher Burgess,et al.  DARLA: Improving Zero-Shot Transfer in Reinforcement Learning , 2017, ICML.

[4]  Bogdan Raducanu,et al.  Invertible Conditional GANs for image editing , 2016, ArXiv.

[5]  Ole Winther,et al.  Autoencoding beyond pixels using a learned similarity metric , 2015, ICML.

[6]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Roger B. Grosse,et al.  Isolating Sources of Disentanglement in Variational Autoencoders , 2018, NeurIPS.

[8]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[10]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[11]  Hanjiang Lai,et al.  Soft-Gated Warping-GAN for Pose-Guided Person Image Synthesis , 2018, NeurIPS.

[12]  Adam Roberts,et al.  Latent Constraints: Learning to Generate Conditionally from Unconditional Generative Models , 2017, ICLR.

[13]  Guillaume Lample,et al.  Fader Networks: Manipulating Images by Sliding Attributes , 2017, NIPS.

[14]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[15]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[16]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[17]  Hyunsoo Kim,et al.  Learning to Discover Cross-Domain Relations with Generative Adversarial Networks , 2017, ICML.

[18]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Alexander A. Alemi,et al.  Deep Variational Information Bottleneck , 2017, ICLR.

[20]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[21]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[22]  Joshua B. Tenenbaum,et al.  Deep Convolutional Inverse Graphics Network , 2015, NIPS.

[23]  Zhou Wang,et al.  Multi-scale structural similarity for image quality assessment , 2003 .

[24]  Karl Ridgeway,et al.  A Survey of Inductive Biases for Factorial Representation-Learning , 2016, ArXiv.

[25]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[26]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[27]  Bruno A. Olshausen,et al.  Discovering Hidden Factors of Variation in Deep Networks , 2014, ICLR.

[28]  Bernhard Schölkopf,et al.  Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations , 2018, ICML.

[29]  Aaron C. Courville,et al.  Adversarially Learned Inference , 2016, ICLR.

[30]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[31]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[32]  Chris Donahue,et al.  Semantically Decomposing the Latent Spaces of Generative Adversarial Networks , 2017, ICLR.

[33]  Chuan Li,et al.  Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial Networks , 2016, ECCV.

[34]  Navdeep Jaitly,et al.  Adversarial Autoencoders , 2015, ArXiv.

[35]  Xiaoming Liu,et al.  Disentangled Representation Learning GAN for Pose-Invariant Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[37]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  LinLin Shen,et al.  Deep Feature Consistent Variational Autoencoder , 2016, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[39]  Abhishek Kumar,et al.  Variational Inference of Disentangled Latent Concepts from Unlabeled Observations , 2017, ICLR.

[40]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[41]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[42]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[43]  Jung-Woo Ha,et al.  StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Jürgen Schmidhuber,et al.  Learning Factorial Codes by Predictability Minimization , 1992, Neural Computation.

[45]  Emilien Dupont,et al.  Joint-VAE: Learning Disentangled Joint Continuous and Discrete Representations , 2018, NeurIPS.