Realtime Dynamic 3D Facial Reconstruction for Monocular Video In-the-Wild

With the increasing amount of videos recorded using 2D mobile cameras, the technique for recovering the 3D dynamic facial models from these monocular videos has become a necessity for many image and video editing applications. While methods based parametric 3D facial models can reconstruct the 3D shape in dynamic environment, large structural changes are ignored. Structure-from-motion methods can reconstruct these changes but assume the object to be static. To address this problem we present a novel method for realtime dynamic 3D facial tracking and reconstruction from videos captured in uncontrolled environments. Our method can track the deforming facial geometry and reconstruct external objects that protrude from the face such as glasses and hair. It also allows users to move around, perform facial expressions freely without degrading the reconstruction quality.

[1]  Jonathan Balzer,et al.  Efficient minimal-surface regularization of perspective depth maps in variational stereo , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Justus Thies,et al.  Face2Face: real-time face capture and reenactment of RGB videos , 2019, Commun. ACM.

[3]  Ira Kemelmacher-Shlizerman,et al.  Total Moving Face Reconstruction , 2014, ECCV.

[4]  Yiying Tong,et al.  FaceWarehouse: A 3D Facial Expression Database for Visual Computing , 2014, IEEE Transactions on Visualization and Computer Graphics.

[5]  Kun Zhou,et al.  Real-time facial animation with image-based dynamic avatars , 2016, ACM Trans. Graph..

[6]  Zhao Wang,et al.  Real‐time facial expression transfer with single video camera , 2016, Comput. Animat. Virtual Worlds.

[7]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[8]  Hans-Peter Seidel,et al.  Computer‐Suggested Facial Makeup , 2011, Comput. Graph. Forum.

[9]  Justus Thies,et al.  Real-time expression transfer for facial reenactment , 2015, ACM Trans. Graph..

[10]  Hans-Peter Seidel,et al.  Lightweight binocular facial performance capture under uncontrolled lighting , 2012, ACM Trans. Graph..

[11]  Konrad Schindler,et al.  Massively Parallel Multiview Stereopsis by Surface Normal Diffusion , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[12]  Timothy A. Davis,et al.  Direct methods for sparse linear systems , 2006, Fundamentals of algorithms.

[13]  Jian Sun,et al.  Face Alignment at 3000 FPS via Regressing Local Binary Features , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Daming Shi,et al.  Sign-Correlation Partition Based on Global Supervised Descent Method for Face Alignment , 2016, ACCV.

[15]  J. Renaud Numerical Optimization, Theoretical and Practical Aspects— , 2006, IEEE Transactions on Automatic Control.

[16]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[17]  Kun Zhou,et al.  Displaced dynamic expression regression for real-time facial tracking and animation , 2014, ACM Trans. Graph..

[18]  Daniel Cremers,et al.  Real-Time Dense Geometry from a Handheld Camera , 2010, DAGM-Symposium.

[19]  Stefanos Zafeiriou,et al.  The First Facial Landmark Tracking in-the-Wild Challenge: Benchmark and Results , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[20]  Jean Charles Gilbert,et al.  Numerical Optimization: Theoretical and Practical Aspects , 2003 .

[21]  Jean Ponce,et al.  Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Adam Finkelstein,et al.  Perspective-aware manipulation of portrait photos , 2016, ACM Trans. Graph..

[23]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[24]  Hao Li,et al.  Real-Time Facial Segmentation and Performance Capture from RGB Input , 2016, ECCV.

[25]  Jihun Yu,et al.  Unconstrained realtime facial performance capture , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Daming Shi,et al.  Robust facial landmark detection and tracking across poses and expressions for in-the-wild monocular video , 2017, Computational Visual Media.

[27]  Daming Shi,et al.  Supervised coordinate descent method with a 3D bilinear model for face alignment and tracking , 2017, Comput. Animat. Virtual Worlds.

[28]  Horst Bischof,et al.  Online 3D reconstruction using convex optimization , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[29]  Christian Theobalt,et al.  Reconstruction of Personalized 3D Face Rigs from Monocular Video , 2016, ACM Trans. Graph..

[30]  Andrew J. Davison,et al.  DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[31]  J. J. Moré,et al.  Levenberg--Marquardt algorithm: implementation and theory , 1977 .

[32]  Hao Li,et al.  Realtime performance-based facial animation , 2011, ACM Trans. Graph..

[33]  Kun Zhou,et al.  3D shape regression for real-time facial animation , 2013, ACM Trans. Graph..

[34]  Derek Bradley,et al.  An anatomically-constrained local deformation model for monocular face capture , 2016, ACM Trans. Graph..

[35]  Jihun Yu,et al.  Realtime facial animation with on-the-fly correctives , 2013, ACM Trans. Graph..

[36]  Christian Theobalt,et al.  Reconstructing detailed dynamic face geometry from monocular video , 2013, ACM Trans. Graph..

[37]  Thabo Beeler,et al.  Real-time high-fidelity facial performance capture , 2015, ACM Trans. Graph..

[38]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Josephine Sullivan,et al.  One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Jitendra Malik,et al.  Shape, Illumination, and Reflectance from Shading , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Mark Pauly,et al.  Dynamic 3D avatar creation from hand-held video input , 2015, ACM Trans. Graph..

[42]  Damien Garcia,et al.  Robust smoothing of gridded data in one and higher dimensions with missing values , 2010, Comput. Stat. Data Anal..