Instant Multi-View Head Capture through Learnable Registration

Existing methods for capturing datasets of 3D heads in dense semantic correspondence are slow, and commonly address the problem in two separate steps; multi-view stereo (MVS) reconstruction followed by non-rigid registration. To simplify this process, we introduce TEMPEH (Towards Estimation of 3D Meshes from Performances of Expressive Heads) to directly infer 3D heads in dense correspondence from calibrated multi-view images. Registering datasets of 3D scans typically requires manual parameter tuning to find the right balance between accurately fitting the scans surfaces and being robust to scanning noise and outliers. Instead, we propose to jointly register a 3D head dataset while training TEMPEH. Specifically, during training we minimize a geometric loss commonly used for surface registration, effectively leveraging TEMPEH as a regularizer. Our multi-view head inference builds on a volumetric feature representation that samples and fuses features from each view using camera calibration information. To account for partial occlusions and a large capture volume that enables head movements, we use view- and surface-aware feature fusion, and a spatial transformer-based head localization module, respectively. We use raw MVS scans as supervision during training, but, once trained, TEMPEH directly predicts 3D heads in dense correspondence without requiring scans. Predicting one head takes about 0.3 seconds with a median reconstruction error of 0.26 mm, 64% lower than the current state-of-the-art. This enables the efficient capture of large datasets containing multiple people and diverse facial motions. Code, model, and data are publicly available at

[1]  P. Maragos,et al.  Visual Speech-Aware Perceptual 3D Facial Expression Reconstruction from Videos , 2022, ArXiv.

[2]  Michael J. Black,et al.  Towards Racially Unbiased Skin Tone Estimation via Scene Disambiguation , 2022, ECCV.

[3]  Michael J. Black,et al.  EMOCA: Emotion Driven Monocular Face Capture and Animation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Wojciech Zielonka,et al.  Towards Metrical Reconstruction of Human Faces , 2022, ECCV.

[5]  V. Sharmanska,et al.  DAD-3DHeads: A Large-scale Dense, Accurate and Diverse Dataset for 3D Head Alignment from a Single Image , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Stephan J. Garbin,et al.  3D face reconstruction with dense landmarks , 2022, ECCV.

[7]  Di Huang,et al.  ImFace: A Nonlinear 3D Morphable Face Model with Implicit Neural Representations , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Bing Liu,et al.  Multi-view stereo in the Deep Learning Era: A comprehensive revfiew , 2021, Displays.

[9]  Feng Liu,et al.  Shape My Face: Registering 3D Face Scans by Surface-to-Surface Translation , 2020, International Journal of Computer Vision.

[10]  Michael J. Black,et al.  Learning an animatable detailed 3D face model from in-the-wild images , 2020, ACM Trans. Graph..

[11]  Tao Yu,et al.  Deep Implicit Templates for 3D Shape Representation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Gemma Piella,et al.  Survey on 3D face reconstruction from uncalibrated images , 2020, Comput. Sci. Rev..

[13]  Feng Liu,et al.  Learning Implicit Functions for Topology-Varying Dense 3D Shape Correspondence , 2020, NeurIPS.

[14]  Long Quan,et al.  Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency , 2020, ECCV.

[15]  Thabo Beeler,et al.  Single-shot high-quality facial geometry and skin appearance capture , 2020, ACM Trans. Graph..

[16]  Zhaopeng Cui,et al.  Deep Facial Non-Rigid Multi-View Stereo , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Ruigang Yang,et al.  FaceScape: A Large-Scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Siyu Zhu,et al.  Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[20]  Eimear O' Sullivan,et al.  Towards a Complete 3D Morphable Model of the Human Head , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Sanja Fidler,et al.  Kaolin: A PyTorch Library for Accelerating 3D Deep Learning Research , 2019, ArXiv.

[22]  Nick Pears,et al.  Statistical Modeling of Craniofacial Shape and Texture , 2019, International Journal of Computer Vision.

[23]  Xavier Giró-i-Nieto,et al.  Multi-View 3D Face Reconstruction in the Wild Using Siamese Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[24]  T. Vetter,et al.  3D Morphable Face Models—Past, Present, and Future , 2019, ACM Trans. Graph..

[25]  Michael J. Black,et al.  Learning to Regress 3D Face Shape and Expression From an Image Without 3D Supervision , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Victor Lempitsky,et al.  Learnable Triangulation of Human Pose , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27]  Stephen Lin,et al.  DPSNet: End-to-end Deep Plane Sweep Stereo , 2019, ICLR.

[28]  Yichen Wei,et al.  3D Dense Face Alignment via Graph Convolution Networks , 2019, ArXiv.

[29]  King Ngi Ngan,et al.  MVF-Net: Multi-View 3D Face Morphable Model Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Jiaolong Yang,et al.  Accurate 3D Face Reconstruction With Weakly-Supervised Learning: From Single Image to Image Set , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[31]  Stefanos Zafeiriou,et al.  GANFIT: Generative Adversarial Network Fitting for High Fidelity 3D Face Reconstruction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Feng Liu,et al.  3D Face Modeling From Diverse Raw Scan Data , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Hans-Peter Seidel,et al.  FML: Face Model Learning From Videos , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Yi Zhou,et al.  On the Continuity of Rotation Representations in Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Yaser Sheikh,et al.  Deep incremental learning for efficient high-fidelity face tracking , 2018, ACM Trans. Graph..

[36]  Gordon Wetzstein,et al.  DeepVoxels: Learning Persistent 3D Feature Embeddings , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Edmond Boyer,et al.  Spatiotemporal Modeling for Efficient Registration of Dynamic 3D Faces , 2018, 2018 International Conference on 3D Vision (3DV).

[38]  William T. Freeman,et al.  Unsupervised Training for 3D Morphable Model Regression , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Patrick Pérez,et al.  State of the Art on Monocular 3D Face Reconstruction, Tracking, and Applications , 2018, Comput. Graph. Forum.

[40]  Long Quan,et al.  MVSNet: Depth Inference for Unstructured Multi-view Stereo , 2018, ECCV.

[41]  Xi Zhou,et al.  Joint 3D Face Reconstruction and Dense Alignment with Position Map Regression Network , 2018, ECCV.

[42]  M. Zollhöfer,et al.  Self-Supervised Multi-level Face Model Learning for Monocular Reconstruction at Over 250 Hz , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Michael J. Black,et al.  Learning a model of facial shape and expression from 4D scans , 2017, ACM Trans. Graph..

[44]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[45]  William Smith,et al.  A 3D Morphable Model of Craniofacial Shape and Texture Variation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[46]  Ioannis A. Kakadiaris,et al.  Multi-view 3D face reconstruction with deep recurrent neural networks , 2017, 2017 IEEE International Joint Conference on Biometrics (IJCB).

[47]  William A. P. Smith,et al.  What Does 2D Geometric Information Really Tell Us About 3D Face Shape? , 2017, International Journal of Computer Vision.

[48]  Jitendra Malik,et al.  Learning a Multi-View Stereo Machine , 2017, NIPS.

[49]  Mike Seymour,et al.  Meet Mike: epic avatars , 2017, SIGGRAPH VR Village.

[50]  Andrew Jones,et al.  Multi‐View Stereo on Consistent Face Topology , 2017, Comput. Graph. Forum.

[51]  Ioannis A. Kakadiaris,et al.  End-to-End 3D Face Reconstruction with Deep Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Justus Thies,et al.  InverseFaceNet: Deep Monocular Inverse Face Rendering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[53]  Patrick Pérez,et al.  MoFA: Model-Based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[54]  Georgios Tzimiropoulos,et al.  Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[55]  Tal Hassner,et al.  Regressing Robust and Discriminative 3D Morphable Models with a Very Deep Neural Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Iasonas Kokkinos,et al.  DenseReg: Fully Convolutional Dense Shape Regression In-the-Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Matan Sela,et al.  3D Face Reconstruction by Learning from Synthetic Data , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[58]  Justus Thies,et al.  Face2Face: Real-Time Face Capture and Reenactment of RGB Videos , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Chao Zhang,et al.  Functional Faces: Groupwise Dense Correspondence Using Functional Maps , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Stefanos Zafeiriou,et al.  A 3D Morphable Model Learnt from 10,000 Faces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Christian Theobalt,et al.  Reconstruction of Personalized 3D Face Rigs from Monocular Video , 2016, ACM Trans. Graph..

[62]  William A. P. Smith,et al.  Fitting a 3D Morphable Model to Edges: A Comparison Between Hard and Soft Correspondences , 2016, ACCV Workshops.

[63]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Timo Bolkart,et al.  A Groupwise Multilinear Correspondence Optimization for 3D Faces , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[65]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[66]  Yiying Tong,et al.  Unconstrained 3D face reconstruction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[68]  Ira Kemelmacher-Shlizerman,et al.  Total Moving Face Reconstruction , 2014, ECCV.

[69]  Yiying Tong,et al.  FaceWarehouse: A 3D Facial Expression Database for Visual Computing , 2014, IEEE Transactions on Visualization and Computer Graphics.

[70]  Ira Kemelmacher-Shlizerman,et al.  Internet Based Morphable Model , 2013, 2013 IEEE International Conference on Computer Vision.

[71]  Oswald Aldrian,et al.  Inverse Rendering of Faces with a 3D Morphable Model , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[72]  F. Prieto,et al.  Fully automatic expression-invariant face correspondence , 2012, Machine Vision and Applications.

[73]  Paul E. Debevec,et al.  Multiview face capture using polarized spherical gradient illumination , 2011, ACM Trans. Graph..

[74]  Ira Kemelmacher-Shlizerman,et al.  Face reconstruction in the wild , 2011, 2011 International Conference on Computer Vision.

[75]  Ioannis A. Kakadiaris,et al.  Using Facial Symmetry to Handle Pose Variations in Real-World 3D Face Recognition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[76]  Derek Bradley,et al.  High-quality passive facial performance capture using anchor frames , 2011, ACM Trans. Graph..

[77]  W. Heidrich,et al.  High resolution passive facial performance capture , 2010, ACM Trans. Graph..

[78]  Leonidas J. Guibas,et al.  Robust single-view geometry and motion reconstruction , 2009, ACM Trans. Graph..

[79]  Paul Debevec,et al.  The Digital Emily project: photoreal facial modeling and animation , 2009, SIGGRAPH '09.

[80]  Thomas Vetter,et al.  Expression invariant 3D face recognition with a Morphable Model , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[81]  Pieter Peers,et al.  Rapid Acquisition of Specular and Diffuse Normal Maps from Polarized Spherical Gradient Illumination , 2007 .

[82]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[83]  Xinguo Liu,et al.  Light-Weight Multi-view Topology Consistent Facial Geometry and Reflectance Capture , 2021, CGI.

[84]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[85]  Stuart Geman,et al.  Statistical methods for tomographic image reconstruction , 1987 .