Towards High Fidelity Monocular Face Reconstruction with Rich Reflectance using Self-supervised Learning and Ray Tracing

Robust face reconstruction from monocular image in general lighting conditions is challenging. Methods combining deep neural network encoders with differentiable rendering have opened up the path for very fast monocular reconstruction of geometry, lighting and reflectance. They can also be trained in self-supervised manner for increased robustness and better generalization. However, their differentiable rasterization-based image formation models, as well as underlying scene parameterization, limit them to Lambertian face reflectance and to poor shape details. More recently, ray tracing was introduced for monocular face reconstruction within a classic optimization-based framework and enables state-of-the art results. However, optimization-based approaches are inherently slow and lack robustness. In this paper, we build our work on the afore-mentioned approaches and propose a new method that greatly improves reconstruction quality and robustness in general scenes. We achieve this by combining a CNN encoder with a differentiable ray tracer, which enables us to base the reconstruction on much more advanced personalized diffuse and specular albedos, a more sophisticated illumination model and a plausible representation of self-shadows. This enables to take a big leap forward in reconstruction quality of shape, appearance and lighting even in scenes with difficult illumination. With consistent face attributes reconstruction, our method leads to practical applications such as relighting and self-shadows removal. Compared to state-of-the-art methods, our results show improved accuracy and validity of the approach.

[1]  Robert L. Cook,et al.  A Reflectance Model for Computer Graphics , 1987, TOGS.

[2]  James T. Kajiya,et al.  The rendering equation , 1986, SIGGRAPH.

[3]  S. Umeyama,et al.  Least-Squares Estimation of Transformation Parameters Between Two Point Patterns , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[5]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[6]  Tomas Akenine-Möller,et al.  Fast, Minimum Storage Ray-Triangle Intersection , 1997, J. Graphics, GPU, & Game Tools.

[7]  M. Gross,et al.  Analysis of human faces using a measurement-based skin reflectance model , 2006, ACM Trans. Graph..

[8]  Steve Marschner,et al.  Microfacet Models for Refraction through Rough Surfaces , 2007, Rendering Techniques.

[9]  Ravi Ramamoorthi,et al.  A Theory Of Frequency Domain Invariants: Spherical Harmonic Identities for BRDF/Lighting Transfer and Image Consistency , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Paul E. Debevec,et al.  Multiview face capture using polarized spherical gradient illumination , 2011, ACM Trans. Graph..

[11]  Hans-Peter Seidel,et al.  Shading-based dynamic shape refinement from multi-view video under general illumination , 2011, 2011 International Conference on Computer Vision.

[12]  Derek Bradley,et al.  High-quality passive facial performance capture using anchor frames , 2011, ACM Trans. Graph..

[13]  Hans-Peter Seidel,et al.  Lightweight binocular facial performance capture under uncontrolled lighting , 2012, ACM Trans. Graph..

[14]  Christian Theobalt,et al.  Reconstructing detailed dynamic face geometry from monocular video , 2013, ACM Trans. Graph..

[15]  Ira Kemelmacher-Shlizerman,et al.  Total Moving Face Reconstruction , 2014, ECCV.

[16]  Kun Zhou,et al.  Intrinsic Face Image Decomposition with Human Face Priors , 2014, ECCV.

[17]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[18]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19]  Christian Theobalt,et al.  Reconstruction of Personalized 3D Face Rigs from Monocular Video , 2016, ACM Trans. Graph..

[20]  Derek Bradley,et al.  An anatomically-constrained local deformation model for monocular face capture , 2016, ACM Trans. Graph..

[21]  Jaakko Lehtinen,et al.  Production-level facial performance capture using deep convolutional neural networks , 2016, Symposium on Computer Animation.

[22]  Hao Li,et al.  Photorealistic Facial Texture Inference Using Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Patrick Pérez,et al.  MoFA: Model-Based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Michael J. Black,et al.  Learning a model of facial shape and expression from 4D scans , 2017, ACM Trans. Graph..

[25]  Georgios Tzimiropoulos,et al.  How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks) , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[27]  Carlos D. Castillo,et al.  SfSNet: Learning Shape, Reflectance and Illuminance of Faces 'in the Wild' , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Xiaoming Liu,et al.  Nonlinear 3D Face Morphable Model , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Jaakko Lehtinen,et al.  Differentiable Monte Carlo ray tracing through edge sampling , 2018, ACM Trans. Graph..

[30]  M. Zollhöfer,et al.  Self-Supervised Multi-level Face Model Learning for Monocular Reconstruction at Over 250 Hz , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Yaser Sheikh,et al.  Modeling Facial Geometry Using Compositional VAEs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Andrew Jones,et al.  Mesoscopic Facial Geometry Inference Using Deep Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Derek Bradley,et al.  Practical dynamic facial appearance modeling and acquisition , 2018, ACM Trans. Graph..

[34]  Bernhard Egger,et al.  Morphable Face Models - An Open Framework , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[35]  Shigeo Morishima,et al.  High-fidelity facial reflectance and geometry inference from an unconstrained image , 2018, ACM Trans. Graph..

[36]  Patrick Pérez,et al.  State of the Art on Monocular 3D Face Reconstruction, Tracking, and Applications , 2018, Comput. Graph. Forum.

[37]  Feng Liu,et al.  Towards High-Fidelity Nonlinear 3D Face Morphable Model , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Xiaoming Liu,et al.  Face Alignment in Full Pose Range: A 3D Total Solution , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Michael J. Black,et al.  Learning to Regress 3D Face Shape and Expression From an Image Without 3D Supervision , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Jeffrey F. Cohn,et al.  The 2nd 3D Face Alignment in the Wild Challenge (3DFAW-Video): Dense Reconstruction From Video , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[41]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Louis Chevallier,et al.  Face Reflectance and Geometry Modeling via Differentiable Ray Tracing , 2019, ArXiv.

[43]  Hans-Peter Seidel,et al.  FML: Face Model Learning From Videos , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Wenzel Jakob,et al.  Reparameterizing discontinuous integrands for differentiable rendering , 2019, ACM Trans. Graph..

[45]  Bernhard Egger,et al.  A Morphable Face Albedo Model , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  FaceLab: Scalable Facial Performance Capture for Visual Effects , 2020, DigiPro.

[47]  Yun-Ta Tsai,et al.  Portrait shadow manipulation , 2020, ACM Trans. Graph..

[48]  Baoyuan Wang,et al.  Personalized Face Modeling for Improved Face Reconstruction and Motion Retargeting , 2020, ECCV.

[49]  Stefanos Zafeiriou,et al.  AvatarMe: Realistically Renderable 3D Facial Reconstruction “In-the-Wild” , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Hao Li,et al.  Learning Formation of Physically-Based Face Attributes , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Zhen Lei,et al.  Towards Fast, Accurate and Stable 3D Dense Face Alignment , 2020, ECCV.

[52]  T. Vetter,et al.  3D Morphable Face Models—Past, Present, and Future , 2019, ACM Trans. Graph..

[53]  Pablo Garrido,et al.  High-Fidelity Monocular Face Reconstruction Based on an Unsupervised Model-Based Face Autoencoder , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  William A. P. Smith,et al.  "Look Ma, No Landmarks!" - Unsupervised, Model-Based Dense Face Alignment , 2020, ECCV.

[55]  Christian Theobalt,et al.  Efficient and Differentiable Shadow Computation for Inverse Problems , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[56]  Louis Chevallier,et al.  Practical Face Reconstruction via Differentiable Ray Tracing , 2021, Comput. Graph. Forum.