Towards harmonized regional style transfer and manipulation for facial images

Regional facial image synthesis conditioned on semantic mask has achieved great success using generative adversarial networks. However, the appearance of different regions may be inconsistent with each other when conducting regional image editing. In this paper, we focus on the problem of harmonized regional style transfer and manipulation for facial images. The proposed approach supports regional style transfer and manipulation at the same time. A multi-scale encoder and style mapping networks are proposed in our work. The encoder is responsible for extracting regional styles of real faces. Style mapping networks generate styles from random samples for all facial regions. As the key part of our work, we propose a multi-region style attention module to adapt the multiple regional style embeddings from a reference image to a target image for generating harmonious and plausible results. Furthermore, we propose a new metric “harmony score” and conduct experiments in a challenging setting: three widely used face datasets are involved and we test the model by transferring the regional facial appearance between datasets. Images in different datasets are usually quite different, which makes the inconsistency between target and reference regions more obvious. Results show that our model can generate reliable style transfer and multi-modal manipulation results compared with SOTAs. Furthermore, we show two face editing applications using the proposed approach.

[1]  Peter Wonka,et al.  StyleFlow: Attribute-conditioned Exploration of StyleGAN-Generated Images using Conditional Continuous Normalizing Flows , 2020, ArXiv.

[2]  Ming-Hsuan Yang,et al.  Deep Image Harmonization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Chen Gao,et al.  PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[5]  Lingyun Wu,et al.  MaskGAN: Towards Diverse and Interactive Facial Image Manipulation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Alexei A. Efros,et al.  Toward Multimodal Image-to-Image Translation , 2017, NIPS.

[7]  Peter Wonka,et al.  SEAN: Image Synthesis With Semantic Region-Adaptive Normalization , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[9]  Xiang Bai,et al.  Semantically Multi-Modal Image Synthesis , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Taesung Park,et al.  Semantic Image Synthesis With Spatially-Adaptive Normalization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Lu Yuan,et al.  Mask-Guided Portrait Editing With Conditional GANs , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Shi-Min Hu,et al.  Example-Guided Style-Consistent Image Synthesis From Semantic Labeling , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Han Zhang,et al.  Self-Attention Generative Adversarial Networks , 2018, ICML.

[14]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Jan Kautz,et al.  High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Deli Zhao,et al.  In-Domain GAN Inversion for Real Image Editing , 2020, ECCV.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[19]  Alexei A. Efros,et al.  Learning a Discriminative Model for the Perception of Realism in Composite Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[21]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[22]  Vladlen Koltun,et al.  Photographic Image Synthesis with Cascaded Refinement Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Bolei Zhou,et al.  Closed-Form Factorization of Latent Semantics in GANs , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Hailin Shi,et al.  A New Dataset and Boundary-Attention Semantic Segmentation for Face Parsing , 2020, AAAI.

[26]  Bolei Zhou,et al.  Interpreting the Latent Space of GANs for Semantic Face Editing , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Jan Kautz,et al.  Multimodal Unsupervised Image-to-Image Translation , 2018, ECCV.

[29]  Lu Yuan,et al.  Cross-Domain Correspondence Learning for Exemplar-Based Image Translation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[31]  Daniel Cohen-Or,et al.  Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Liqing Zhang,et al.  DoveNet: Deep Image Harmonization via Domain Verification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Maneesh Kumar Singh,et al.  DRIT++: Diverse Image-to-Image Translation via Disentangled Representations , 2019, International Journal of Computer Vision.

[34]  Raymond Y. K. Lau,et al.  Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  Christian Theobalt,et al.  StyleRig: Rigging StyleGAN for 3D Control Over Portrait Images , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Rob Fergus,et al.  Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[37]  Seunghoon Hong,et al.  Diversity-Sensitive Conditional Generative Adversarial Networks , 2019, ICLR.

[38]  Jaakko Lehtinen,et al.  Analyzing and Improving the Image Quality of StyleGAN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[40]  Jaegul Choo,et al.  Reference-Based Sketch Image Colorization Using Augmented-Self Reference and Dense Semantic Correspondence , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[42]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Chi-Man Pun,et al.  Improving the Harmony of the Composite Image by Spatial-Separated Attention Module , 2019, IEEE Transactions on Image Processing.

[44]  Jung-Woo Ha,et al.  StarGAN v2: Diverse Image Synthesis for Multiple Domains , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.