AgileAvatar: Stylized 3D Avatar Creation via Cascaded Domain Bridging

Stylized 3D avatars have become increasingly prominent in our modern life. Creating these avatars manually usually involves laborious selection and adjustment of continuous and discrete parameters and is time-consuming for average users. Self-supervised approaches to automatically create 3D avatars from user selfies promise high quality with little annotation cost but fall short in application to stylized avatars due to a large style domain gap. We propose a novel self-supervised learning framework to create high-quality stylized 3D avatars with a mix of continuous and discrete parameters. Our cascaded domain bridging framework first leverages a modified portrait stylization approach to translate input selfies into stylized avatar renderings as the targets for desired 3D avatars. Next, we find the best parameters of the avatars to match the stylized avatar renderings through a differentiable imitator we train to mimic the avatar graphics engine. To ensure we can effectively optimize the discrete parameters, we adopt a cascaded relaxation-and-search pipeline. We use a human preference study to evaluate how well our method preserves user identity compared to previous work as well as manual creation. Our results achieve much higher preference scores than previous work and close to those of manual creation. We also provide an ablation study to justify the design choices in our pipeline.

[1]  Peter Wonka,et al.  Mind the Gap: Domain Gap Control for Single Shot Domain Adaptation for Generative Adversarial Networks , 2021, ICLR.

[2]  Daniel Cohen-Or,et al.  StyleGAN-NADA , 2021, ACM Trans. Graph..

[3]  Tat-Jen Cham,et al.  AgileGAN , 2021, ACM Trans. Graph..

[4]  Koki Nagano,et al.  Normalized Avatar Synthesis Using StyleGAN and Perceptual Refinement , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Tae-Kyun Kim,et al.  Learning Feature Aggregation for Deep 3D Morphable Models , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Xiaoguang Han,et al.  3DCaricShop: A Dataset and A Baseline Method for Single-view 3D Caricature Face Reconstruction , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Daniel Cohen-Or,et al.  Designing an encoder for StyleGAN image manipulation , 2021, ACM Trans. Graph..

[8]  Zhengxia Zou,et al.  MeInGame: Create a Game Character Face from a Single Portrait , 2021, AAAI.

[9]  Doron Adler,et al.  Resolution Dependent GAN Interpolation for Controllable Image Synthesis Between Domains , 2020, ArXiv.

[10]  Antonio Torralba,et al.  Improving Inversion and Generation Diversity in StyleGAN using a Gaussianized Latent Space , 2020, ArXiv.

[11]  Daniel Cohen-Or,et al.  Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Alexei A. Efros,et al.  Contrastive Learning for Unpaired Image-to-Image Translation , 2020, ECCV.

[13]  Jiaolong Yang,et al.  Deep 3D Portrait From a Single Image , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Juyong Zhang,et al.  Landmark Detection and 3D Face Reconstruction for Caricature using a Nonlinear Parametric Model , 2020, Graph. Model..

[15]  Changjie Fan,et al.  Fast and Robust Face-to-Parameter Translation for Game Character Auto-Creation , 2020, AAAI.

[16]  Ruigang Yang,et al.  FaceScape: A Large-Scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Yu-Kun Lai,et al.  3D-CariGAN: An End-to-End Solution to 3D Caricature Generation from Normal Face Photos , 2020, IEEE Transactions on Visualization and Computer Graphics.

[18]  Tero Karras,et al.  Analyzing and Improving the Image Quality of StyleGAN , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Chang Liu,et al.  Straight-Through Estimator as Projected Wasserstein Gradient Flow , 2019, ArXiv.

[20]  Dominik Roblek,et al.  Fréchet Audio Distance: A Reference-Free Metric for Evaluating Music Enhancement Algorithms , 2019, INTERSPEECH.

[21]  T. Vetter,et al.  3D Morphable Face Models—Past, Present, and Future , 2019, ACM Trans. Graph..

[22]  Lingyun Wu,et al.  MaskGAN: Towards Diverse and Interactive Facial Image Manipulation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Jiaolong Yang,et al.  Accurate 3D Face Reconstruction With Weakly-Supervised Learning: From Single Image to Image Set , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[24]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Jing Liao,et al.  CariGANs , 2018, ACM Trans. Graph..

[26]  Gang Yu,et al.  BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation , 2018, ECCV.

[27]  Yizhou Yu,et al.  CaricatureShop: Personalized and Photorealistic Caricature Sketching , 2018, IEEE Transactions on Visualization and Computer Graphics.

[28]  Patrick Pérez,et al.  State of the Art on Monocular 3D Face Reconstruction, Tracking, and Applications , 2018, Comput. Graph. Forum.

[29]  Jianfei Cai,et al.  Alive Caricature from 2D to 3D , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  S. Zafeiriou,et al.  ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Sebastian Nowozin,et al.  Which Training Methods for GANs do actually Converge? , 2018, ICML.

[33]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Hao Li,et al.  Avatar digitization from a single image for real-time rendering , 2017, ACM Trans. Graph..

[35]  D. Cohen-Or,et al.  Co-locating style-defining elements on 3D shapes , 2017, TOGS.

[36]  Yong Su,et al.  Parametric T-Spline Face Morphable Model for Detailed Fitting in Shape Subspace , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Yizhou Yu,et al.  DeepSketch2Face , 2017, ACM Trans. Graph..

[38]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[40]  Kun Zhou,et al.  Real-time facial animation with image-based dynamic avatars , 2016, ACM Trans. Graph..

[41]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Mark Pauly,et al.  Dynamic 3D avatar creation from hand-held video input , 2015, ACM Trans. Graph..

[43]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[44]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[45]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[46]  Joaquim B. Cavalcante Neto,et al.  Three-Dimensional Face Caricaturing by Anthropometric Distortions , 2013, 2013 XXVI Conference on Graphics, Patterns and Images.

[47]  Luiz Velho,et al.  Interactive 3D caricature from harmonic exaggeration , 2011, Comput. Graph..

[48]  Thabo Beeler,et al.  High-quality single-shot capture of facial geometry , 2010, ACM Trans. Graph..

[49]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[50]  Juncong Lin,et al.  Deep 3D caricature face generation with identity and structure consistency , 2021, Neurocomputing.

[51]  Harshad Rai,et al.  Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks , 2018 .