论文信息 - TransGaGa: Geometry-Aware Unsupervised Image-To-Image Translation

TransGaGa: Geometry-Aware Unsupervised Image-To-Image Translation

Unsupervised image-to-image translation aims at learning a mapping between two visual domains. However, learning a translation across large geometry variations al- ways ends up with failure. In this work, we present a novel disentangle-and-translate framework to tackle the complex objects image-to-image translation task. Instead of learning the mapping on the image space directly, we disentangle image space into a Cartesian product of the appearance and the geometry latent spaces. Specifically, we first in- troduce a geometry prior loss and a conditional VAE loss to encourage the network to learn independent but com- plementary representations. The translation is then built on appearance and geometry space separately. Extensive experiments demonstrate the superior performance of our method to other state-of-the-art approaches, especially in the challenging near-rigid and non-rigid objects translation tasks. In addition, by taking different exemplars as the ap- pearance references, our method also supports multimodal translation. Project page: https://wywu.github. io/projects/TGaGa/TGaGa.html

[1] Li Fei-Fei,et al. Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[2] Yu Qiao,et al. ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks , 2018, ECCV Workshops.

[3] Alexei A. Efros,et al. Colorful Image Colorization , 2016, ECCV.

[4] Kwan-Yee Lin,et al. Hallucinated-IQA: No-Reference Image Quality Assessment via Adversarial Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[6] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[7] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[8] Xiaogang Wang,et al. Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[9] Jeff Donahue,et al. Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[10] Fei-Fei Li,et al. Novel Dataset for Fine-Grained Image Categorization : Stanford Dogs , 2012 .

[11] Luc Van Gool,et al. Exemplar Guided Unsupervised Image-to-Image Translation , 2018, ArXiv.

[12] 拓海杉山,et al. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[13] Kwan-Yee Lin,et al. Self-supervised Deep Multiple Choice Learning Network for Blind Image Quality Assessment , 2018, BMVC.

[14] Eero P. Simoncelli,et al. Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[15] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[16] Jia Deng,et al. Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[17] Alex Graves,et al. Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[18] Ping Tan,et al. DualGAN: Unsupervised Dual Learning for Image-to-Image Translation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[19] Andrea Vedaldi,et al. Unsupervised Learning of Object Landmarks by Factorized Spatial Embeddings , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20] Pieter Abbeel,et al. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[21] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[22] Yuting Zhang,et al. Unsupervised Discovery of Object Landmarks as Structural Representations , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23] Tao Qin,et al. Conditional Image-to-Image Translation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24] Kwang In Kim,et al. Improving Shape Deformation in Unsupervised Image-to-Image Translation , 2018, ECCV.

[25] Alexei A. Efros,et al. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26] Michael J. Black,et al. Lions and Tigers and Bears: Capturing Non-rigid, 3D, Articulated Shape from Images , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27] Jan Kautz,et al. Unsupervised Image-to-Image Translation Networks , 2017, NIPS.

[28] Jing Liao,et al. CariGANs , 2018, ACM Trans. Graph..

[29] Alexei A. Efros,et al. Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Georgios Tzimiropoulos,et al. Super-FAN: Integrated Facial Landmark Localization and Super-Resolution of Real-World Low Resolution Faces in Arbitrary Poses with GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32] Ankush Gupta,et al. Conditional Image Generation for Learning the Structure of Visual Objects , 2018, ArXiv.

[33] Bernhard Schölkopf,et al. EnhanceNet: Single Image Super-Resolution Through Automated Texture Synthesis , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[34] Leon A. Gatys,et al. Texture Synthesis Using Convolutional Neural Networks , 2015, NIPS.

[35] Andreas Dengel,et al. Real-time Analysis and Visualization of the YFCC100m Dataset , 2015, MMCommons '15.

[36] Christian Ledig,et al. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[38] Honglak Lee,et al. Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[39] Alexei A. Efros,et al. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[40] Chen Qian,et al. Weakly-Supervised Discovery of Geometry-Aware Representation for 3D Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Frédo Durand,et al. Synthesizing Images of Humans in Unseen Poses , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42] Luc Van Gool,et al. Disentangled Person Image Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43] Jan Kautz,et al. High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44] Alexei A. Efros,et al. Toward Multimodal Image-to-Image Translation , 2017, NIPS.

[45] Christopher Burgess,et al. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[46] Chen Qian,et al. ReenactGAN: Learning to Reenact Faces via Boundary Transfer , 2018, ECCV.

[47] Andrea Vedaldi,et al. Unsupervised learning of object frames by dense equivariant image labelling , 2017, NIPS.

[48] Björn Ommer,et al. A Variational U-Net for Conditional Appearance and Shape Generation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49] Maneesh Kumar Singh,et al. DRIT++: Diverse Image-to-Image Translation via Disentangled Representations , 2019, International Journal of Computer Vision.

[50] Sven J. Dickinson,et al. 3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model , 2012, NIPS.

[51] Nicu Sebe,et al. Every Smile is Unique: Landmark-Guided Diverse Smile Generation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[52] Hyunsoo Kim,et al. Learning to Discover Cross-Domain Relations with Generative Adversarial Networks , 2017, ICML.

[53] Xiaogang Wang,et al. Multi-context Attention for Human Pose Estimation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54] Jan Kautz,et al. Multimodal Unsupervised Image-to-Image Translation , 2018, ECCV.