Learning an Animatable Detailed 3D Face Model from In-The-Wild Images

While current monocular 3D face reconstruction methods can recover fine geometric details, they suffer several limitations. Some methods produce faces that cannot be realistically animated because they do not model how wrinkles vary with expression. Other methods are trained on high-quality face scans and do not generalize well to in-the-wild images. We present the first approach to jointly learn a model with animatable detail and a detailed 3D face regressor from in-the-wild images that recovers shape details as well as their relationship to facial expressions. Our DECA (Detailed Expression Capture and Animation) model is trained to robustly produce a UV displacement map from a low-dimensional latent representation that consists of person-specific detail parameters and generic expression parameters, while a regressor is trained to predict detail, shape, albedo, expression, pose and illumination parameters from a single image. We introduce a novel detail-consistency loss to disentangle person-specific details and expression-dependent wrinkles. This disentanglement allows us to synthesize realistic person-specific wrinkles by controlling expression parameters while keeping person-specific details unchanged. DECA achieves state-of-the-art shape reconstruction accuracy on two benchmarks. Qualitative results on in-the-wild data demonstrate DECA's robustness and its ability to disentangle identity and expression dependent details enabling animation of reconstructed faces. The model and code are publicly available at https://github.com/YadiraF/DECA.

[1]  William A. P. Smith,et al.  Fitting a 3D Morphable Model to Edges: A Comparison Between Hard and Soft Correspondences , 2016, ACCV Workshops.

[2]  Hao Li,et al.  paGAN: real-time avatars using dynamic textures , 2019, ACM Trans. Graph..

[3]  Mark Pauly,et al.  Dynamic 3D avatar creation from hand-held video input , 2015, ACM Trans. Graph..

[4]  Jihun Yu,et al.  Realtime facial animation with on-the-fly correctives , 2013, ACM Trans. Graph..

[5]  Shigeo Morishima,et al.  High-fidelity facial reflectance and geometry inference from an unconstrained image , 2018, ACM Trans. Graph..

[6]  Carlos D. Castillo,et al.  SfSNet: Learning Shape, Reflectance and Illuminance of Faces 'in the Wild' , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Hans-Peter Seidel,et al.  FML: Face Model Learning From Videos , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Oswald Aldrian,et al.  Inverse Rendering of Faces with a 3D Morphable Model , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Omkar M. Parkhi,et al.  VGGFace2: A Dataset for Recognising Faces across Pose and Age , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[10]  M. Zollhöfer,et al.  Self-Supervised Multi-level Face Model Learning for Monocular Reconstruction at Over 250 Hz , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Patrick Pérez,et al.  MoFA: Model-Based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Wojciech Matusik,et al.  A statistical model for synthesis of detailed facial geometry , 2006, ACM Trans. Graph..

[13]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[14]  Jianzhu Guo,et al.  Towards Fast, Accurate and Stable 3D Dense Face Alignment , 2020, ECCV.

[15]  Yu Qiao,et al.  DF2Net: A Dense-Fine-Finer Network for Detailed 3D Face Reconstruction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Kenny Mitchell,et al.  Feature-preserving detailed 3D face reconstruction from a single image , 2018, CVMP '18.

[17]  Michael J. Black,et al.  Learning a model of facial shape and expression from 4D scans , 2017, ACM Trans. Graph..

[18]  Patrick Pérez,et al.  State of the Art on Monocular 3D Face Reconstruction, Tracking, and Applications , 2018, Comput. Graph. Forum.

[19]  Ira Kemelmacher-Shlizerman,et al.  Total Moving Face Reconstruction , 2014, ECCV.

[20]  Michael J. Black,et al.  Learning to Regress 3D Face Shape and Expression From an Image Without 3D Supervision , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Xi Zhou,et al.  Joint 3D Face Reconstruction and Dense Alignment with Position Map Regression Network , 2018, ECCV.

[22]  Ron Kimmel,et al.  Unrestricted Facial Geometry Reconstruction Using Image-to-Image Translation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Takeo Kanade,et al.  Dense 3D face alignment from 2D videos in real-time , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[24]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Justus Thies,et al.  Real-time expression transfer for facial reenactment , 2015, ACM Trans. Graph..

[26]  Qijun Zhao,et al.  Evaluation of Dense 3D Reconstruction from 2D Face Images in the Wild , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[27]  Ron Kimmel,et al.  High Quality Facial Surface and Texture Synthesis via Generative Adversarial Networks , 2018, ECCV Workshops.

[28]  Matan Sela,et al.  3D Face Reconstruction by Learning from Synthetic Data , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[29]  Jiaolong Yang,et al.  Accurate 3D Face Reconstruction With Weakly-Supervised Learning: From Single Image to Image Set , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[30]  Ruigang Yang,et al.  FaceScape: A Large-Scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Xiangyu Zhu,et al.  High-fidelity Pose and Expression Normalization for face recognition in the wild , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Baoyuan Wang,et al.  Personalized Face Modeling for Improved Face Reconstruction and Motion Retargeting , 2020, ECCV.

[33]  Philip H. S. Torr,et al.  Cross-Modal Deep Face Normals With Deactivable Skip Connections , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Stefanos Zafeiriou,et al.  GANFIT: Generative Adversarial Network Fitting for High Fidelity 3D Face Reconstruction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[36]  Feng Liu,et al.  Towards High-Fidelity Nonlinear 3D Face Morphable Model , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Jianfei Cai,et al.  CNN-Based Real-Time Dense Face Reconstruction with Inverse-Rendered Photo-Realistic Face Images , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Thabo Beeler,et al.  Real-time high-fidelity facial performance capture , 2015, ACM Trans. Graph..

[39]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[40]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[41]  Bernhard Egger,et al.  Morphable Face Models - An Open Framework , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[42]  Sami Romdhani,et al.  Estimating 3D shape and texture using pixel intensity, edges, specular highlights, texture constraints and a prior , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[43]  Tal Hassner,et al.  On Face Segmentation, Face Swapping, and Face Perception , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[44]  Matan Sela,et al.  Learning Detailed Face Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Soo-Mi Choi,et al.  Extraction and Transfer of Facial Expression Wrinkles for Facial Performance Enhancement , 2014, PG.

[46]  Georgios Tzimiropoulos,et al.  How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks) , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[47]  Sami Romdhani,et al.  Face identification across different poses and illuminations with a 3D morphable model , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[48]  Yichen Wei,et al.  3D Dense Face Alignment via Graph Convolution Networks , 2019, ArXiv.

[49]  Mark Pauly,et al.  Realtime performance-based facial animation , 2011, ACM Trans. Graph..

[50]  Tal Hassner,et al.  Extreme 3D Face Reconstruction: Seeing Through Occlusions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51]  Kenny Mitchell,et al.  Photo-Realistic Facial Details Synthesis From Single Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[52]  Thabo Beeler,et al.  High-quality single-shot capture of facial geometry , 2010, ACM Trans. Graph..

[53]  Georgios Tzimiropoulos,et al.  Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[54]  Michael J. Black,et al.  Capture, Learning, and Synthesis of 3D Speaking Styles , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Alan Brunton,et al.  Review of statistical shape spaces for 3D data with comparative analysis for human faces , 2012, Comput. Vis. Image Underst..

[56]  Hao Li,et al.  Avatar digitization from a single image for real-time rendering , 2017, ACM Trans. Graph..

[57]  William T. Freeman,et al.  Unsupervised Training for 3D Morphable Model Regression , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[58]  Sami Romdhani,et al.  Face Identification by Fitting a 3D Morphable Model Using Linear Shape and Texture Error Functions , 2002, ECCV.

[59]  Volker Schönefeld Spherical Harmonics , 2019, An Introduction to Radio Astronomy.

[60]  Mei Wang,et al.  Racial Faces in the Wild: Reducing Racial Bias by Information Maximization Adaptation Network , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[61]  Yiying Tong,et al.  Adaptive 3D Face Reconstruction from Unconstrained Photo Collections , 2017, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Ramakant Nevatia,et al.  ExpNet: Landmark-Free, Deep, 3D Facial Expressions , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[63]  Thomas Vetter,et al.  Estimating Coloured 3D Face Models from Single Images: An Example Based Approach , 1998, ECCV.

[64]  Sami Romdhani,et al.  A 3D Face Model for Pose and Illumination Invariant Face Recognition , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[65]  Ira Kemelmacher-Shlizerman,et al.  Face reconstruction in the wild , 2011, 2011 International Conference on Computer Vision.

[66]  Justus Thies,et al.  Face2Face: real-time face capture and reenactment of RGB videos , 2019, Commun. ACM.

[67]  David Salesin,et al.  Synthesizing realistic facial expressions from photographs , 1998, SIGGRAPH.

[68]  Leonidas J. Guibas,et al.  Robust single-view geometry and motion reconstruction , 2009, ACM Trans. Graph..

[69]  Hao Li,et al.  Photorealistic Facial Texture Inference Using Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Joon Son Chung,et al.  VoxCeleb2: Deep Speaker Recognition , 2018, INTERSPEECH.

[71]  Thabo Beeler,et al.  3D Morphable Face Models—Past, Present, and Future , 2020, ACM Trans. Graph..

[72]  Hans-Peter Seidel,et al.  Computer‐Suggested Facial Makeup , 2011, Comput. Graph. Forum.

[73]  Xin Chen,et al.  Sparse Photometric 3D Face Reconstruction Guided by Morphable Models , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[74]  Xin Tong,et al.  Automatic acquisition of high-fidelity facial performances using monocular videos , 2014, ACM Trans. Graph..

[75]  Frederic I. Parke,et al.  A parametric model for human faces. , 1974 .

[76]  Bailin Deng,et al.  3D Face Reconstruction With Geometry Details From a Single Image , 2017, IEEE Transactions on Image Processing.

[77]  Yi Wang,et al.  Image Inpainting via Generative Multi-column Convolutional Neural Networks , 2018, NeurIPS.

[78]  Justus Thies,et al.  InverseFaceNet: Deep Monocular Inverse Face Rendering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[79]  Christian Theobalt,et al.  Reconstruction of Personalized 3D Face Rigs from Monocular Video , 2016, ACM Trans. Graph..

[80]  Jian Zhao,et al.  Joint 3D Face Reconstruction and Dense Face Alignment from A Single Image with 2D-Assisted Self-Supervised Learning , 2019, ArXiv.

[81]  Ioannis A. Kakadiaris,et al.  End-to-End 3D Face Reconstruction with Deep Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[82]  Tal Hassner,et al.  Regressing Robust and Discriminative 3D Morphable Models with a Very Deep Neural Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[83]  Iasonas Kokkinos,et al.  DenseReg: Fully Convolutional Dense Shape Regression In-the-Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).