The GAN That Warped: Semantic Attribute Editing With Unpaired Data

Deep neural networks have recently been used to edit images with great success, in particular for faces. However, they are often limited to only being able to work at a restricted range of resolutions. Many methods are so flexible that face edits can often result in an unwanted loss of identity. This work proposes to learn how to perform semantic image edits through the application of smooth warp fields. Previous approaches that attempted to use warping for semantic edits required paired data, i.e. example images of the same subject with different semantic attributes. In contrast, we employ recent advances in Generative Adversarial Networks that allow our model to be trained with unpaired data. We demonstrate face editing at very high resolutions (4k images) with a single forward pass of a deep network at a lower resolution. We also show that our edits are substantially better at preserving the subject's identity. The robustness of our approach is demonstrated by showing plausible image editing results on the Cub200 birds dataset. To our knowledge this has not been previously accomplished, due the challenging nature of the dataset.

[1]  Ersin Yumer,et al.  ST-GAN: Spatial Transformer Generative Adversarial Networks for Image Compositing , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Kwang In Kim,et al.  Unsupervised Attention-guided Image to Image Translation , 2018, NeurIPS.

[3]  Jung-Woo Ha,et al.  StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Justus Thies,et al.  Face2Face: real-time face capture and reenactment of RGB videos , 2019, Commun. ACM.

[5]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[6]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7]  Patrick Pérez,et al.  Poisson image editing , 2003, ACM Trans. Graph..

[8]  Matthias Zwicker,et al.  Faceshop , 2018, ACM Trans. Graph..

[9]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[10]  Jan Kautz,et al.  Visio-lization: generating novel facial images , 2009, ACM Trans. Graph..

[11]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[12]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Chuang Gan,et al.  Sparse, Smart Contours to Represent and Edit Images , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Edward Y. Chang,et al.  RelGAN: Multi-Domain Image-to-Image Translation via Relative Attributes , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  V. Kshirsagar,et al.  Face recognition using Eigenfaces , 2011, 2011 3rd International Conference on Computer Research and Development.

[16]  Wei Shen,et al.  Learning Residual Images for Face Attribute Manipulation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Jitendra Malik,et al.  View Synthesis by Appearance Flow , 2016, ECCV.

[18]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[19]  Francesc Moreno-Noguer,et al.  GANimation: Anatomically-aware Facial Animation from a Single Image , 2018, ECCV.

[20]  Rob Fergus,et al.  Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[21]  Jonathan T. Barron,et al.  Deep bilateral learning for real-time image enhancement , 2017, ACM Trans. Graph..

[22]  L. Ma,et al.  Real‐Time Facial Expression Transformation for Monocular RGB Video , 2018, Comput. Graph. Forum.

[23]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Hao Li,et al.  High-Resolution Image Inpainting Using Multi-scale Neural Patch Synthesis , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Davis E. King,et al.  Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..

[26]  Robert Pless,et al.  Deep Feature Interpolation for Image Content Changes , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Jiawen Chen,et al.  Bilateral guided upsampling , 2016, ACM Trans. Graph..

[28]  Zicheng Liu,et al.  Expressive expression mapping with ratio images , 2001, SIGGRAPH.

[29]  Victor S. Lempitsky,et al.  DeepWarp: Photorealistic Image Resynthesis for Gaze Manipulation , 2016, ECCV.

[30]  Xiaoyong Shen,et al.  Attribute-Driven Spontaneous Motion in Unpaired Image Translation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Horst Bischof,et al.  A Duality Based Approach for Realtime TV-L1 Optical Flow , 2007, DAGM-Symposium.

[32]  Luc Van Gool,et al.  WESPE: Weakly Supervised Photo Enhancer for Digital Cameras , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[33]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[34]  Hailin Jin,et al.  Disentangling Structure and Aesthetics for Style-Aware Image Completion , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Fred L. Bookstein,et al.  Principal Warps: Thin-Plate Splines and the Decomposition of Deformations , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Margaret Chow,et al.  Computational analysis of LDDMM for brain mapping , 2013, Front. Neurosci..

[38]  Skyler T. Hawk,et al.  Presentation and validation of the Radboud Faces Database , 2010 .

[39]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[40]  Kun Zhou,et al.  Warp-guided GANs for single-photo facial animation , 2018, ACM Trans. Graph..

[41]  Alexei A. Efros,et al.  Generative Visual Manipulation on the Natural Image Manifold , 2016, ECCV.

[42]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[43]  Ziwei Liu,et al.  Semantic Facial Expression Editing using Autoencoded Flow , 2016, ArXiv.

[44]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[46]  Yuxiao Hu,et al.  MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition , 2016, ECCV.

[47]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.