Latent Optimization for Non-adversarial Representation Disentanglement

Disentanglement between pose and content is a key task for artificial intelligence and has attracted much research interest. Current methods for disentanglement include adversarial training and introducing cycle constraints. In this work, we present a novel disentanglement method which does not use adversarial training, achieving state-of-the-art performance. Our method uses latent optimization of an architecture borrowed from style-transfer, to enforce separation of pose and content. We overcome the test generalization issues of latent optimization, by a novel two-stage approach. In extensive experiments, our method is shown to achieve better disentanglement performance than both adversarial and non-adversarial methods that use the same level of supervision.

[1]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Takeru Miyato,et al.  cGANs with Projection Discriminator , 2018, ICLR.

[3]  Roger B. Grosse,et al.  Isolating Sources of Disentanglement in Variational Autoencoders , 2018, NeurIPS.

[4]  Ali Razavi,et al.  Generating Diverse High-Fidelity Images with VQ-VAE-2 , 2019, NeurIPS.

[5]  Jiaying Liu,et al.  Demystifying Neural Style Transfer , 2017, IJCAI.

[6]  Skyler T. Hawk,et al.  Presentation and validation of the Radboud Faces Database , 2010 .

[7]  Andriy Mnih,et al.  Disentangling by Factorising , 2018, ICML.

[8]  Maneesh Kumar Singh,et al.  Disentangling Factors of Variation with Cycle-Consistent Variational Auto-Encoders , 2018, ECCV.

[9]  Yuting Zhang,et al.  Deep Visual Analogy-Making , 2015, NIPS.

[10]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[11]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[12]  Makoto Yamada,et al.  Learning Unsupervised Word Translations Without Adversaries , 2018, EMNLP.

[13]  Jonathon Shlens,et al.  A Learned Representation For Artistic Style , 2016, ICLR.

[14]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[15]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Yann LeCun,et al.  Disentangling factors of variation in deep representation using adversarial training , 2016, NIPS.

[17]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Matthias Zwicker,et al.  Challenges in Disentangling Independent Factors of Variation , 2017, ICLR.

[19]  Sebastian Nowozin,et al.  Multi-Level Variational Autoencoder: Learning Disentangled Representations from Grouped Observations , 2017, AAAI.

[20]  Jan Kautz,et al.  Unsupervised Image-to-Image Translation Networks , 2017, NIPS.

[21]  Jitendra Malik,et al.  Non-Adversarial Image Synthesis With Generative Latent Nearest Neighbors , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Bernhard Schölkopf,et al.  Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations , 2018, ICML.

[23]  David Lopez-Paz,et al.  Optimizing the Latent Space of Generative Networks , 2017, ICML.

[24]  Vighnesh Birodkar,et al.  Unsupervised Learning of Disentangled Representations from Video , 2017, NIPS.

[25]  Lior Wolf,et al.  A Two-Step Disentanglement Method , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Kristen Grauman,et al.  Fine-Grained Visual Comparisons with Local Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Michael C. Mozer,et al.  Learning Deep Disentangled Embeddings with the F-Statistic Loss , 2018, NeurIPS.

[28]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[29]  Lior Wolf,et al.  NAM: Non-Adversarial Unsupervised Domain Mapping , 2018, ECCV.

[30]  Jan Kautz,et al.  Multimodal Unsupervised Image-to-Image Translation , 2018, ECCV.

[31]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[32]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[34]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).