High-Fidelity Synthesis with Disentangled Representation

Learning disentangled representation of data without supervision is an important step towards improving the interpretability of generative models. Despite recent advances in disentangled representation learning, existing approaches often suffer from the trade-off between representation learning and generation performance i.e. improving generation quality sacrifices disentanglement performance). We propose an Information-Distillation Generative Adversarial Network (ID-GAN), a simple yet generic framework that easily incorporates the existing state-of-the-art models for both disentanglement learning and high-fidelity synthesis. Our method learns disentangled representation using VAE-based models, and distills the learned representation with an additional nuisance variable to the separate GAN-based generator for high-fidelity synthesis. To ensure that both generative models are aligned to render the same generative factors, we further constrain the GAN generator to maximize the mutual information between the learned latent code and the output. Despite the simplicity, we show that the proposed method is highly effective, achieving comparable image generation quality to the state-of-the-art methods using the disentangled representation. We also show that the proposed decomposition leads to an efficient and stable model design, and we demonstrate photo-realistic high-resolution image synthesis results (1024x1024 pixels) for the first time using the disentangled representations.

[1]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[2]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[3]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[4]  Gang Hua,et al.  CVAE-GAN: Fine-Grained Image Generation through Asymmetric Training , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  Bernhard Schölkopf,et al.  Wasserstein Auto-Encoders , 2017, ICLR.

[6]  Kayhan Batmanghelich,et al.  Weakly Supervised Disentanglement by Pairwise Similarities , 2019, AAAI.

[7]  Xiang Wei,et al.  Improving the Improved Training of Wasserstein GANs: A Consistency Term and Its Dual Effect , 2018, ICLR.

[8]  Guillaume Desjardins,et al.  Understanding disentangling in $\beta$-VAE , 2018, 1804.03599.

[9]  Alexander Lerchner,et al.  Spatial Broadcast Decoder: A Simple Architecture for Learning Disentangled Representations in VAEs , 2019, ArXiv.

[10]  Alexei A. Efros,et al.  Seeing 3D Chairs: Exemplar Part-Based 2D-3D Alignment Using a Large Dataset of CAD Models , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Thomas Brox,et al.  Generating Images with Perceptual Similarity Metrics based on Deep Networks , 2016, NIPS.

[12]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Andriy Mnih,et al.  Disentangling by Factorising , 2018, ICML.

[14]  Sewoong Oh,et al.  InfoGAN-CR: Disentangling Generative Adversarial Networks with Contrastive Regularizers , 2019, ICML 2020.

[15]  Yee Whye Teh,et al.  Disentangling Disentanglement in Variational Autoencoders , 2018, ICML.

[16]  Yong-Liang Yang,et al.  HoloGAN: Unsupervised Learning of 3D Representations From Natural Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[17]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[18]  Alexander A. Alemi,et al.  GILBO: One Metric to Measure Them All , 2018, NeurIPS.

[19]  Gunhee Kim,et al.  IB-GAN: Disentangled Representation Learning with Information Bottleneck GAN , 2018 .

[20]  Aapo Hyvärinen,et al.  Variational Autoencoders and Nonlinear ICA: A Unifying Framework , 2019, AISTATS.

[21]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Michael Satosi Watanabe,et al.  Information Theoretical Analysis of Multivariate Correlation , 1960, IBM J. Res. Dev..

[23]  Stefano Soatto,et al.  Information Dropout: Learning Optimal Representations Through Noisy Computation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Roger B. Grosse,et al.  Isolating Sources of Disentanglement in Variational Autoencoders , 2018, NeurIPS.

[25]  Jan Kautz,et al.  Multimodal Unsupervised Image-to-Image Translation , 2018, ECCV.

[26]  Stefan Bauer,et al.  Disentangling Factors of Variations Using Few Labels , 2020, ICLR.

[27]  Guillaume Lample,et al.  Fader Networks: Manipulating Images by Sliding Attributes , 2017, NIPS.

[28]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Alexei A. Efros,et al.  Toward Multimodal Image-to-Image Translation , 2017, NIPS.

[30]  Guillaume Desjardins,et al.  Understanding disentangling in β-VAE , 2018, ArXiv.

[31]  Navdeep Jaitly,et al.  Adversarial Autoencoders , 2015, ArXiv.

[32]  José Lezama,et al.  Overcoming the Disentanglement vs Reconstruction Trade-off via Jacobian Supervision , 2018, ICLR.

[33]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[34]  Bernhard Schölkopf,et al.  Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations , 2018, ICML.

[35]  Tieniu Tan,et al.  IntroVAE: Introspective Variational Autoencoders for Photographic Image Synthesis , 2018, NeurIPS.

[36]  Thomas P. Minka,et al.  Divergence measures and message passing , 2005 .

[37]  Gerard de Melo,et al.  OOGAN: Disentangling GAN with One-Hot Sampling and Orthogonal Regularization , 2019 .

[38]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[39]  Sebastian Nowozin,et al.  Which Training Methods for GANs do actually Converge? , 2018, ICML.

[40]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[41]  Sergey Levine,et al.  Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow , 2018, ICLR.

[42]  Olivier Bachem,et al.  Recent Advances in Autoencoder-Based Representation Learning , 2018, ArXiv.

[43]  Toniann Pitassi,et al.  Flexibly Fair Representation Learning by Disentanglement , 2019, ICML.

[44]  Frank D. Wood,et al.  Learning Disentangled Representations with Semi-Supervised Deep Generative Models , 2017, NIPS.

[45]  Stefan Bauer,et al.  On the Fairness of Disentangled Representations , 2019, NeurIPS.

[46]  Maneesh Kumar Singh,et al.  DRIT++: Diverse Image-to-Image Translation via Disentangled Representations , 2019, International Journal of Computer Vision.

[47]  Ole Winther,et al.  Autoencoding beyond pixels using a learned similarity metric , 2015, ICML.

[48]  Xavier Binefa,et al.  Learning Disentangled Representations with Reference-Based Variational Autoencoders , 2019, ICLR 2019.

[49]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Andrew Brock,et al.  Neural Photo Editing with Introspective Adversarial Networks , 2016, ICLR.

[51]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[52]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[53]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).