论文信息 - MaskGAN: Towards Diverse and Interactive Facial Image Manipulation

MaskGAN: Towards Diverse and Interactive Facial Image Manipulation

Facial image manipulation has achieved great progress in recent years. However, previous methods either operate on a predefined set of face attributes or leave users little freedom to interactively manipulate images. To overcome these drawbacks, we propose a novel framework termed MaskGAN, enabling diverse and interactive face manipulation. Our key insight is that semantic masks serve as a suitable intermediate representation for flexible face manipulation with fidelity preservation. MaskGAN has two main components: 1) Dense Mapping Network (DMN) and 2) Editing Behavior Simulated Training (EBST). Specifically, DMN learns style mapping between a free-form user modified mask and a target image, enabling diverse generation results. EBST models the user editing behavior on the source mask, making the overall framework more robust to various manipulated inputs. Specifically, it introduces dual-editing consistency as the auxiliary supervision signal. To facilitate extensive studies, we construct a large-scale high-resolution face dataset with fine-grained mask annotations named CelebAMask-HQ. MaskGAN is comprehensively evaluated on two challenging tasks: attribute transfer and style copy, demonstrating superior performance over other state-of-the-art methods. The code, models, and dataset are available at https://github.com/switchablenorms/CelebAMask-HQ.

[1] Ting-Chun Wang,et al. Image Inpainting for Irregular Holes Using Partial Convolutions , 2018, ECCV.

[2] Jung-Woo Ha,et al. StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3] Yiying Tong,et al. FaceWarehouse: A 3D Facial Expression Database for Visual Computing , 2014, IEEE Transactions on Visualization and Computer Graphics.

[4] Taesung Park,et al. Semantic Image Synthesis With Spatially-Adaptive Normalization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Hyunsoo Kim,et al. Learning to Discover Cross-Domain Relations with Generative Adversarial Networks , 2017, ICML.

[6] Thomas S. Huang,et al. Free-Form Image Inpainting With Gated Convolution , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[7] Winston H. Hsu,et al. PIVTONS: Pose Invariant Virtual Try-On Shoe with Conditional Image Completion , 2018, ACCV.

[8] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[9] Larry S. Davis,et al. VITON: An Image-Based Virtual Try-on Network , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10] Xiaogang Wang,et al. Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[11] Winston H. Hsu,et al. Attribute Augmented Convolutional Neural Network for Face Hallucination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[12] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13] Liang Lin,et al. Toward Characteristic-Preserving Image-based Virtual Try-On Network , 2018, ECCV.

[14] Alexei A. Efros,et al. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[15] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Stefanos Zafeiriou,et al. ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[18] Lu Yuan,et al. Mask-Guided Portrait Editing With Conditional GANs , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Chen Change Loy,et al. Instance-level Facial Attributes Transfer with Geometry-Aware Flow , 2018, AAAI.

[20] Vladlen Koltun,et al. Photographic Image Synthesis with Cascaded Refinement Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21] Hao Li,et al. paGAN: real-time avatars using dynamic textures , 2019, ACM Trans. Graph..

[22] Li Fei-Fei,et al. Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[23] Thomas S. Huang,et al. Generative Image Inpainting with Contextual Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24] Andrea Vedaldi,et al. Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[25] Chao Dong,et al. Recovering Realistic Texture in Image Super-Resolution by Deep Spatial Feature Transform , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26] Xiaoxiao Li,et al. Semantic Image Segmentation via Deep Parsing Network , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27] Ruimao Zhang,et al. Towards Photo-Realistic Virtual Try-On by Adaptively Generating↔Preserving Image Content , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Jan Kautz,et al. High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29] Guillaume Lample,et al. Fader Networks: Manipulating Images by Sliding Attributes , 2017, NIPS.

[30] Youngjoo Jo,et al. SC-FEGAN: Face Editing Generative Adversarial Network With User’s Sketch and Color , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[31] David Zhang,et al. Deep Identity-aware Transfer of Facial Attributes , 2016, ArXiv.

[32] Tom Duff,et al. Compositing digital images , 1984, SIGGRAPH.

[33] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[34] Kun Zhou,et al. Warp-guided GANs for single-photo facial animation , 2018, ACM Trans. Graph..

[35] Jan Kautz,et al. Unsupervised Image-to-Image Translation Networks , 2017, NIPS.

[36] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[37] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[38] Jinwen Ma,et al. ELEGANT: Exchanging Latent Encodings with GAN for Transferring Multiple Face Attributes , 2018, ECCV.

[39] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[40] Jaakko Lehtinen,et al. Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[41] Alexei A. Efros,et al. Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42] Siwei Zhang,et al. One-shot Face Reenactment , 2019, BMVC.

[43] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[44] Ziwei Liu,et al. Semantic Facial Expression Editing using Autoencoded Flow , 2016, ArXiv.

[45] Serge J. Belongie,et al. Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[46] James M. Rehg,et al. Fine-Grained Head Pose Estimation Without Keypoints , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[47] Thomas S. Huang,et al. Interactive Facial Feature Localization , 2012, ECCV.

[48] Simon Osindero,et al. Conditional Generative Adversarial Nets , 2014, ArXiv.

[49] Bogdan Raducanu,et al. Invertible Conditional GANs for image editing , 2016, ArXiv.