High-fidelity Face Tracking for AR/VR via Deep Lighting Adaptation

3D video avatars can empower virtual communications by providing compression, privacy, entertainment, and a sense of presence in AR/VR. Best 3D photo-realistic AR/VR avatars driven by video, that can minimize uncanny effects, rely on person-specific models. However, existing person-specific photo-realistic 3D models are not robust to lighting, hence their results typically miss subtle facial behaviors and cause artifacts in the avatar. This is a major drawback for the scalability of these models in communication systems (e.g., Messenger, Skype, FaceTime) and AR/VR. This paper addresses previous limitations by learning a deep learning lighting model, that in combination with a high-quality 3D face tracking algorithm, provides a method for subtle and robust facial motion transfer from a regular video to a 3D photo-realistic avatar. Extensive experimental validation and comparisons to other state-of-the-art methods demonstrate the effectiveness of the proposed framework in real-world scenarios with variability in pose, expression, and illumination. Our project page can be found at this website.

[1]  M. Zollhöfer,et al.  Self-Supervised Multi-level Face Model Learning for Monocular Reconstruction at Over 250 Hz , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Michael J. Black,et al.  Learning to Regress 3D Face Shape and Expression From an Image Without 3D Supervision , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Yaser Sheikh,et al.  Deep appearance models for face rendering , 2018, ACM Trans. Graph..

[4]  Michael Zollhöfer,et al.  Real-Time Reconstruction of Static and Dynamic Scenes , 2015 .

[5]  Patrick Pérez,et al.  MoFA: Model-Based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Justus Thies,et al.  InverseFaceNet: Deep Single-Shot Inverse Face Rendering From A Single Image , 2017, ArXiv.

[7]  Matan Sela,et al.  3D Face Reconstruction by Learning from Synthetic Data , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[8]  Timothy F. Cootes,et al.  Active Appearance Models , 1998, ECCV.

[9]  Long Quan,et al.  Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency , 2020, ECCV.

[10]  Xi Zhou,et al.  Joint 3D Face Reconstruction and Dense Alignment with Position Map Regression Network , 2018, ECCV.

[11]  Takaaki Shiratori,et al.  Self-Supervised Adaptation of High-Fidelity Face Models for Monocular Performance Tracking , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Xin Li,et al.  Superimposition-guided Facial Reconstruction from Skull , 2018, ArXiv.

[13]  Derek Bradley,et al.  High-quality passive facial performance capture using anchor frames , 2011, ACM Trans. Graph..

[14]  Philip H. S. Torr,et al.  Cross-Modal Deep Face Normals With Deactivable Skip Connections , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Bailin Deng,et al.  Lightweight Photometric Stereo for Facial Details Recovery , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Ruigang Yang,et al.  FaceScape: A Large-Scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Xiaoming Liu,et al.  On Learning 3D Face Morphable Model from In-the-Wild Images , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Stefanos Zafeiriou,et al.  Large Scale 3D Morphable Models , 2017, International Journal of Computer Vision.

[19]  Chen Change Loy,et al.  Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Yiying Tong,et al.  FaceWarehouse: A 3D Facial Expression Database for Visual Computing , 2014, IEEE Transactions on Visualization and Computer Graphics.

[21]  Justus Thies,et al.  Real-time expression transfer for facial reenactment , 2015, ACM Trans. Graph..

[22]  Bernhard Egger,et al.  Morphable Face Models - An Open Framework , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[23]  Xiaogang Wang,et al.  Rotate-and-Render: Unsupervised Photorealistic Face Rotation From Single-View Images , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Simon Baker,et al.  Active Appearance Models Revisited , 2004, International Journal of Computer Vision.

[25]  Kun Zhou,et al.  Towards High-Fidelity 3D Face Reconstruction From In-the-Wild Images Using Graph Convolutional Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Jiaolong Yang,et al.  Accurate 3D Face Reconstruction With Weakly-Supervised Learning: From Single Image to Image Set , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[27]  Hans-Peter Seidel,et al.  FML: Face Model Learning From Videos , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Yiying Tong,et al.  Unconstrained 3D face reconstruction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  David K. McAllister,et al.  OptiX: a general purpose ray tracing engine , 2010, ACM Trans. Graph..

[30]  Volker Schönefeld Spherical Harmonics , 2019, An Introduction to Radio Astronomy.

[31]  Ron Kimmel,et al.  Unrestricted Facial Geometry Reconstruction Using Image-to-Image Translation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32]  Donghoon Lee,et al.  Face alignment using cascade Gaussian process regression trees , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Luc Van Gool,et al.  Fast Optical Flow Using Dense Inverse Search , 2016, ECCV.

[34]  Christian Theobalt,et al.  Reconstruction of Personalized 3D Face Rigs from Monocular Video , 2016, ACM Trans. Graph..

[35]  Yi Xu,et al.  Lighting Estimation via Differentiable Screen-Space Rendering , 2020, 2020 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW).

[36]  Jean-François Lalonde,et al.  Learning Physics-Guided Face Relighting Under Directional Light , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[38]  Yaser Sheikh,et al.  Deep incremental learning for efficient high-fidelity face tracking , 2018, ACM Trans. Graph..

[39]  Günther Greiner,et al.  Real-Time Halfway Domain Reconstruction of Motion and Geometry , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[40]  Feng Liu,et al.  Towards High-Fidelity Nonlinear 3D Face Morphable Model , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  William J. Christmas,et al.  A Multiresolution 3D Morphable Face Model and Fitting Framework , 2016, VISIGRAPP.

[42]  Maja Pantic,et al.  Generic Active Appearance Models Revisited , 2012, ACCV.

[43]  Jianzhu Guo,et al.  Towards Fast, Accurate and Stable 3D Dense Face Alignment , 2020, ECCV.

[44]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[45]  Michael J. Black,et al.  Learning a model of facial shape and expression from 4D scans , 2017, ACM Trans. Graph..

[46]  Chen Cao,et al.  Stabilized real-time face tracking via a learned dynamic rigidity prior , 2018, ACM Trans. Graph..

[47]  Victor Lempitsky,et al.  Few-Shot Adversarial Learning of Realistic Neural Talking Head Models , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[48]  Stefanos Zafeiriou,et al.  ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Yiying Tong,et al.  Adaptive 3D Face Reconstruction from Unconstrained Photo Collections , 2017, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[51]  Stefanos Zafeiriou,et al.  GANFIT: Generative Adversarial Network Fitting for High Fidelity 3D Face Reconstruction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Chenliang Xu,et al.  Talking-head Generation with Rhythmic Head Motion , 2020, ECCV.

[53]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[54]  Carlos D. Castillo,et al.  SfSNet: Learning Shape, Reflectance and Illuminance of Faces 'in the Wild' , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[55]  Sami Romdhani,et al.  A 3D Face Model for Pose and Illumination Invariant Face Recognition , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[56]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[57]  Xiaoming Liu,et al.  Face Alignment in Full Pose Range: A 3D Total Solution , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.