High-Fidelity 3D Digital Human Creation from RGB-D Selfies

We present a fully automatic system that can produce high-fidelity, photo-realistic 3D digital human characters with a consumer RGB-D selfie camera. The system only needs the user to take a short selfie RGB-D video while rotating his/her head, and can produce a high quality reconstruction in less than 30 seconds. Our main contribution is a new facial geometry modeling and reflectance synthesis procedure that significantly improves the state-of-the-art. Specifically, given the input video a two-stage frame selection algorithm is first employed to select a few high-quality frames for reconstruction. A novel, differentiable renderer based 3D Morphable Model (3DMM) fitting method is then applied to recover facial geometries from multiview RGB-D data, which takes advantages of extensive data generation and perturbation. Our 3DMM has much larger expressive capacities than conventional 3DMM, allowing us to recover more accurate facial geometry using merely linear bases. For reflectance synthesis, we present a hybrid approach that combines parametric fitting and CNNs to synthesize high-resolution albedo/normal maps with realistic hair/pore/wrinkle details. Results show that our system can produce faithful 3D characters with extremely realistic details. Code and the constructed 3DMM is publicly available.

[1]  V. Lepetit,et al.  EPnP: An Accurate O(n) Solution to the PnP Problem , 2009, International Journal of Computer Vision.

[2]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[3]  Edward H. Adelson,et al.  A multiresolution spline with application to image mosaics , 1983, TOGS.

[4]  Christian Theobalt,et al.  Reconstruction of Personalized 3D Face Rigs from Monocular Video , 2016, ACM Trans. Graph..

[5]  Georgios Tzimiropoulos,et al.  Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Marcus A. Magnor,et al.  Sparse localized deformation components , 2013, ACM Trans. Graph..

[7]  Hao Li,et al.  Avatar digitization from a single image for real-time rendering , 2017, ACM Trans. Graph..

[8]  William T. Freeman,et al.  Unsupervised Training for 3D Morphable Model Regression , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Szymon Rusinkiewicz,et al.  Structure-aware hair capture , 2013, ACM Trans. Graph..

[10]  Yun Fu,et al.  Image Super-Resolution Using Very Deep Residual Channel Attention Networks , 2018, ECCV.

[11]  Frédo Durand,et al.  A Fast Approximation of the Bilateral Filter Using a Signal Processing Approach , 2006, International Journal of Computer Vision.

[12]  Xiangyu Zhu,et al.  High-fidelity Pose and Expression Normalization for face recognition in the wild , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Stefanos Zafeiriou,et al.  ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Jihun Yu,et al.  Realtime facial animation with on-the-fly correctives , 2013, ACM Trans. Graph..

[15]  Xin Tong,et al.  Accurate and Robust 3D Facial Capture Using a Single RGBD Camera , 2013, 2013 IEEE International Conference on Computer Vision.

[16]  Thabo Beeler,et al.  3D Morphable Face Models—Past, Present, and Future , 2020, ACM Trans. Graph..

[17]  Günther Greiner,et al.  Automatic reconstruction of personalized avatars from 3D face scans , 2011, Comput. Animat. Virtual Worlds.

[18]  Ira Kemelmacher-Shlizerman,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 3d Face Reconstruction from a Single Image Using a Single Reference Face Shape , 2022 .

[19]  Paul E. Debevec,et al.  Digital ira and beyond: creating real-time photoreal digital actors , 2014, SIGGRAPH '14.

[20]  Dan Su,et al.  Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams , 2020, ArXiv.

[21]  Matan Sela,et al.  Learning Detailed Face Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[23]  Kun Zhou,et al.  High-quality hair modeling from a single portrait photo , 2015, ACM Trans. Graph..

[24]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  M. Zollhöfer,et al.  Self-Supervised Multi-level Face Model Learning for Monocular Reconstruction at Over 250 Hz , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Hao Li,et al.  3D hair synthesis using volumetric variational autoencoders , 2018, ACM Trans. Graph..

[27]  Derek Bradley,et al.  Lightweight eye capture using a parametric model , 2016, ACM Trans. Graph..

[28]  Tal Hassner,et al.  Extreme 3D Face Reconstruction: Seeing Through Occlusions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  King Ngi Ngan,et al.  MVF-Net: Multi-View 3D Face Morphable Model Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  PaulyMark,et al.  Dynamic 3D avatar creation from hand-held video input , 2015 .

[31]  Christian Theobalt,et al.  Reconstructing detailed dynamic face geometry from monocular video , 2013, ACM Trans. Graph..

[32]  Stefanos Zafeiriou,et al.  GANFIT: Generative Adversarial Network Fitting for High Fidelity 3D Face Reconstruction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Hao Li,et al.  paGAN: real-time avatars using dynamic textures , 2019, ACM Trans. Graph..

[34]  Patrick Pérez,et al.  MoFA: Model-Based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  Fernando De la Torre,et al.  Interactive region-based linear 3D face models , 2011, ACM Trans. Graph..

[36]  Ioannis A. Kakadiaris,et al.  Multi-view 3D face reconstruction with deep recurrent neural networks , 2017, 2017 IEEE International Joint Conference on Biometrics (IJCB).

[37]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[38]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Thomas Gerig,et al.  Gaussian Process Morphable Models , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Andrew W. Fitzgibbon,et al.  Real-time non-rigid reconstruction using an RGB-D camera , 2014, ACM Trans. Graph..

[41]  Paul E. Debevec,et al.  Acquiring the reflectance field of a human face , 2000, SIGGRAPH.

[42]  Yangang Wang,et al.  Online modeling for realtime facial animation , 2013, ACM Trans. Graph..

[43]  Ron Kimmel,et al.  Unrestricted Facial Geometry Reconstruction Using Image-to-Image Translation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[44]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[45]  Derek Bradley,et al.  An empirical rig for jaw animation , 2018, ACM Trans. Graph..

[46]  Chongyang Ma,et al.  Single-view hair modeling using a hairstyle database , 2015, ACM Trans. Graph..

[47]  Xiaoming Liu,et al.  Nonlinear 3D Face Morphable Model , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  Sami Romdhani,et al.  Estimating 3D shape and texture using pixel intensity, edges, specular highlights, texture constraints and a prior , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[49]  Kun Zhou,et al.  Real-time facial animation with image-based dynamic avatars , 2016, ACM Trans. Graph..

[50]  Justus Thies,et al.  Real-time expression transfer for facial reenactment , 2015, ACM Trans. Graph..

[51]  Stefanos Zafeiriou,et al.  A 3D Morphable Model Learnt from 10,000 Faces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Jianfei Cai,et al.  CNN-Based Real-Time Dense Face Reconstruction with Inverse-Rendered Photo-Realistic Face Images , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Erik Reinhard,et al.  Color Transfer between Images , 2001, IEEE Computer Graphics and Applications.

[54]  Justus Thies,et al.  Face2Face: real-time face capture and reenactment of RGB videos , 2019, Commun. ACM.

[55]  Yiying Tong,et al.  FaceWarehouse: A 3D Facial Expression Database for Visual Computing , 2014, IEEE Transactions on Visualization and Computer Graphics.

[56]  Hao Li,et al.  Photorealistic Facial Texture Inference Using Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Shigeo Morishima,et al.  High-fidelity facial reflectance and geometry inference from an unconstrained image , 2018, ACM Trans. Graph..

[58]  Steve Marschner,et al.  Appearance capture and modeling of human teeth , 2018, ACM Trans. Graph..

[59]  Thabo Beeler,et al.  High-quality single-shot capture of facial geometry , 2010, ACM Trans. Graph..

[60]  Derek Bradley,et al.  Accurate markerless jaw tracking for facial performance capture , 2019, ACM Trans. Graph..

[61]  Chengzhu Yu,et al.  DurIAN: Duration Informed Attention Network For Multimodal Synthesis , 2019, ArXiv.

[62]  Tal Hassner,et al.  Regressing Robust and Discriminative 3D Morphable Models with a Very Deep Neural Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Xin Tong,et al.  Automatic acquisition of high-fidelity facial performances using monocular videos , 2014, ACM Trans. Graph..

[64]  Jan Kautz,et al.  High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[65]  Harry Shum,et al.  Modeling hair from multiple views , 2005, ACM Trans. Graph..

[66]  Stefanos Zafeiriou,et al.  AvatarMe: Realistically Renderable 3D Facial Reconstruction “In-the-Wild” , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  K. S. Arun,et al.  Least-Squares Fitting of Two 3-D Point Sets , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[68]  Andrea Tagliasacchi,et al.  Modern techniques and applications for real-time non-rigid registration , 2016, SIGGRAPH ASIA Courses.

[69]  Yonggen Ling,et al.  Self-Supervised Learning of Detailed 3D Face Reconstruction , 2020, IEEE Transactions on Image Processing.

[70]  Mark Pauly,et al.  Semi-Supervised Facial Animation Retargeting , 2014 .

[71]  Xiangyu Zhu,et al.  Face Alignment in Full Pose Range: A 3D Total Solution , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[72]  Sami Romdhani,et al.  A 3D Face Model for Pose and Illumination Invariant Face Recognition , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[73]  Patrick Pérez,et al.  State of the Art on Monocular 3D Face Reconstruction, Tracking, and Applications , 2018, Comput. Graph. Forum.

[74]  Derek Bradley,et al.  Model-based teeth reconstruction , 2016, ACM Trans. Graph..