论文信息 - Latent Optimization for Non-adversarial Representation Disentanglement

Latent Optimization for Non-adversarial Representation Disentanglement

Disentanglement between pose and content is a key task for artificial intelligence and has attracted much research interest. Current methods for disentanglement include adversarial training and introducing cycle constraints. In this work, we present a novel disentanglement method which does not use adversarial training, achieving state-of-the-art performance. Our method uses latent optimization of an architecture borrowed from style-transfer, to enforce separation of pose and content. We overcome the test generalization issues of latent optimization, by a novel two-stage approach. In extensive experiments, our method is shown to achieve better disentanglement performance than both adversarial and non-adversarial methods that use the same level of supervision.

Yedid Hoshen | Aviv Gabbay

[1] Leon A. Gatys,et al. Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Takeru Miyato,et al. cGANs with Projection Discriminator , 2018, ICLR.

[3] Roger B. Grosse,et al. Isolating Sources of Disentanglement in Variational Autoencoders , 2018, NeurIPS.

[4] Ali Razavi,et al. Generating Diverse High-Fidelity Images with VQ-VAE-2 , 2019, NeurIPS.

[5] Jiaying Liu,et al. Demystifying Neural Style Transfer , 2017, IJCAI.

[6] Skyler T. Hawk,et al. Presentation and validation of the Radboud Faces Database , 2010 .

[7] Andriy Mnih,et al. Disentangling by Factorising , 2018, ICML.

[8] Maneesh Kumar Singh,et al. Disentangling Factors of Variation with Cycle-Consistent Variational Auto-Encoders , 2018, ECCV.

[9] Yuting Zhang,et al. Deep Visual Analogy-Making , 2015, NIPS.

[10] Christopher Burgess,et al. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[11] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[12] Makoto Yamada,et al. Learning Unsupervised Word Translations Without Adversaries , 2018, EMNLP.

[13] Jonathon Shlens,et al. A Learned Representation For Artistic Style , 2016, ICLR.

[14] Y. LeCun,et al. Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[15] Alexei A. Efros,et al. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16] Yann LeCun,et al. Disentangling factors of variation in deep representation using adversarial training , 2016, NIPS.

[17] Alexei A. Efros,et al. Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Matthias Zwicker,et al. Challenges in Disentangling Independent Factors of Variation , 2017, ICLR.

[19] Sebastian Nowozin,et al. Multi-Level Variational Autoencoder: Learning Disentangled Representations from Grouped Observations , 2017, AAAI.

[20] Jan Kautz,et al. Unsupervised Image-to-Image Translation Networks , 2017, NIPS.

[21] Jitendra Malik,et al. Non-Adversarial Image Synthesis With Generative Latent Nearest Neighbors , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Bernhard Schölkopf,et al. Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations , 2018, ICML.

[23] David Lopez-Paz,et al. Optimizing the Latent Space of Generative Networks , 2017, ICML.

[24] Vighnesh Birodkar,et al. Unsupervised Learning of Disentangled Representations from Video , 2017, NIPS.

[25] Lior Wolf,et al. A Two-Step Disentanglement Method , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26] Kristen Grauman,et al. Fine-Grained Visual Comparisons with Local Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27] Michael C. Mozer,et al. Learning Deep Disentangled Embeddings with the F-Statistic Loss , 2018, NeurIPS.

[28] Jeff Donahue,et al. Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[29] Lior Wolf,et al. NAM: Non-Adversarial Unsupervised Domain Mapping , 2018, ECCV.

[30] Jan Kautz,et al. Multimodal Unsupervised Image-to-Image Translation , 2018, ECCV.

[31] Barbara Caputo,et al. Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[32] Timo Aila,et al. A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[34] Xiaogang Wang,et al. Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).