Self-supervised High-fidelity and Re-renderable 3D Facial Reconstruction from a Single Image

Reconstructing high-fidelity 3D facial texture from a single image is a challenging task since the lack of complete face information and the domain gap between the 3D face and 2D image. The most recent works tackle facial texture reconstruction problem by applying either generation-based or reconstruction-based methods. Although each method has its own advantage, none of them is capable of recovering a high-fidelity and re-renderable facial texture, where the term ’re-renderable’ demands the facial texture to be spatially complete and disentangled with environmental illumination. In this paper, we propose a novel self-supervised learning framework for reconstructing high-quality 3D faces from single-view images in-the-wild. Our main idea is to first utilize the prior generation module to produce a prior albedo, then leverage the detail refinement module to obtain detailed albedo. To further make facial textures disentangled with illumination, we present a novel detailed illumination representation which is reconstructed with the detailed albedo together. We also design several regularization loss functions on both the albedo side and illumination side to facilitate the disentanglement of these two factors. Finally, thanks to the differentiable rendering technique, our neural network can be efficiently trained in a self-supervised manner. Extensive experiments on challenging datasets demonstrate that our framework substantially outperforms state-of-the-art approaches in both qualitative and quantitative comparisons.

[1]  Ruigang Yang,et al.  FaceScape: A Large-Scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Hao Li,et al.  paGAN: real-time avatars using dynamic textures , 2019, ACM Trans. Graph..

[3]  Bailin Deng,et al.  Lightweight Photometric Stereo for Facial Details Recovery , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Sami Romdhani,et al.  Estimating 3D shape and texture using pixel intensity, edges, specular highlights, texture constraints and a prior , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[6]  Yiying Tong,et al.  FaceWarehouse: A 3D Facial Expression Database for Visual Computing , 2014, IEEE Transactions on Visualization and Computer Graphics.

[7]  Hans-Peter Seidel,et al.  FML: Face Model Learning From Videos , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Justus Thies,et al.  Deferred Neural Rendering: Image Synthesis using Neural Textures , 2019 .

[9]  Stefanos Zafeiriou,et al.  ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Changjie Fan,et al.  Neutral Face Game Character Auto-Creation via PokerFace-GAN , 2020, ACM Multimedia.

[11]  Justus Thies,et al.  Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Xiangyu Zhu,et al.  Face Alignment in Full Pose Range: A 3D Total Solution , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Zeyu Chen,et al.  ReDA:Reinforced Differentiable Attribute for 3D Face Reconstruction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Xiaoming Liu,et al.  Nonlinear 3D Face Morphable Model , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Kenny Mitchell,et al.  Photo-Realistic Facial Details Synthesis From Single Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Sami Romdhani,et al.  A 3D Face Model for Pose and Illumination Invariant Face Recognition , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[17]  Lei Zhang,et al.  Face recognition from a single training image under arbitrary unknown lighting using spherical harmonics , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Thomas Vetter,et al.  Face Recognition Based on Fitting a 3D Morphable Model , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Shigeo Morishima,et al.  High-fidelity facial reflectance and geometry inference from an unconstrained image , 2018, ACM Trans. Graph..

[20]  Christian Theobalt,et al.  Reconstruction of Personalized 3D Face Rigs from Monocular Video , 2016, ACM Trans. Graph..

[21]  Fang Zhao,et al.  Multi-Prototype Networks for Unconstrained Set-based Face Recognition , 2019, IJCAI.

[22]  Yonggen Ling,et al.  Self-Supervised Learning of Detailed 3D Face Reconstruction , 2020, IEEE Transactions on Image Processing.

[23]  Hao Li,et al.  Soft Rasterizer: A Differentiable Renderer for Image-Based 3D Reasoning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Jun Yu,et al.  Real-Time Head Pose Estimation and Face Modeling From a Depth Image , 2019, IEEE Transactions on Multimedia.

[25]  Tieniu Tan,et al.  A Light CNN for Deep Face Representation With Noisy Labels , 2015, IEEE Transactions on Information Forensics and Security.

[26]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Feng Liu,et al.  Towards High-Fidelity Nonlinear 3D Face Morphable Model , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Hao Li,et al.  Avatar digitization from a single image for real-time rendering , 2017, ACM Trans. Graph..

[29]  Stefanos Zafeiriou,et al.  GANFIT: Generative Adversarial Network Fitting for High Fidelity 3D Face Reconstruction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  M. Zollhöfer,et al.  Self-Supervised Multi-level Face Model Learning for Monocular Reconstruction at Over 250 Hz , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Thabo Beeler,et al.  3D Morphable Face Models—Past, Present, and Future , 2020, ACM Trans. Graph..

[32]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[33]  Patrick Pérez,et al.  MoFA: Model-Based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[34]  Matan Sela,et al.  3D Face Reconstruction by Learning from Synthetic Data , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[35]  Xi Zhou,et al.  Joint 3D Face Reconstruction and Dense Alignment with Position Map Regression Network , 2018, ECCV.

[36]  William T. Freeman,et al.  Unsupervised Training for 3D Morphable Model Regression , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Lingyun Wu,et al.  MaskGAN: Towards Diverse and Interactive Facial Image Manipulation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Martin D. Levine,et al.  State-of-the-art of 3D facial reconstruction methods for face recognition based on a single 2D training image per person , 2009, Pattern Recognit. Lett..

[39]  Andrea Vedaldi,et al.  Unsupervised Learning of Probably Symmetric Deformable 3D Objects From Images in the Wild , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Stefanos Zafeiriou,et al.  Large Scale 3D Morphable Models , 2017, International Journal of Computer Vision.

[41]  Bernhard Egger,et al.  Morphable Face Models - An Open Framework , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[42]  Kun Zhou,et al.  Towards High-Fidelity 3D Face Reconstruction From In-the-Wild Images Using Graph Convolutional Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Gang Yu,et al.  BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation , 2018, ECCV.

[44]  Matan Sela,et al.  Learning Detailed Face Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Stefanos Zafeiriou,et al.  AvatarMe: Realistically Renderable 3D Facial Reconstruction “In-the-Wild” , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Ping-Sing Tsai,et al.  Shape from Shading: A Survey , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[47]  Xiaogang Wang,et al.  Rotate-and-Render: Unsupervised Photorealistic Face Rotation From Single-View Images , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[50]  Michael J. Black,et al.  Learning a model of facial shape and expression from 4D scans , 2017, ACM Trans. Graph..

[51]  Georgios Tzimiropoulos,et al.  Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[52]  Stefanos Zafeiriou,et al.  UV-GAN: Adversarial Facial UV Map Completion for Pose-Invariant Face Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[53]  Patrick Pérez,et al.  Deep video portraits , 2018, ACM Trans. Graph..

[54]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[55]  Bernhard Egger,et al.  Efficient Global Illumination for Morphable Models , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[56]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Jiaolong Yang,et al.  Accurate 3D Face Reconstruction With Weakly-Supervised Learning: From Single Image to Image Set , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[58]  Tatsuya Harada,et al.  Neural 3D Mesh Renderer , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[59]  William J. Christmas,et al.  A Multiresolution 3D Morphable Face Model and Fitting Framework , 2016, VISIGRAPP.