Stabilized real-time face tracking via a learned dynamic rigidity prior

Despite the popularity of real-time monocular face tracking systems in many successful applications, one overlooked problem with these systems is rigid instability. It occurs when the input facial motion can be explained by either head pose change or facial expression change, creating ambiguities that often lead to jittery and unstable rigid head poses under large expressions. Existing rigid stabilization methods either employ a heavy anatomically-motivated approach that are unsuitable for real-time applications, or utilize heuristic-based rules that can be problematic under certain expressions. We propose the first rigid stabilization method for real-time monocular face tracking using a dynamic rigidity prior learned from realistic datasets. The prior is defined on a region-based face model and provides dynamic region-based adaptivity for rigid pose optimization during real-time performance. We introduce an effective offline training scheme to learn the dynamic rigidity prior by optimizing the convergence of the rigid pose optimization to the ground-truth poses in the training data. Our real-time face tracking system is an optimization framework that alternates between rigid pose optimization and expression optimization. To ensure tracking accuracy, we combine both robust, drift-free facial landmarks and dense optical flow into the optimization objectives. We evaluate our system extensively against state-of-the-art monocular face tracking systems and achieve significant improvement in tracking accuracy on the high-quality face tracking benchmark. Our system can improve facial-performance-based applications such as facial animation retargeting and virtual face makeup with accurate expression and stable pose. We further validate the dynamic rigidity prior by comparing it against other variants on the tracking accuracy.

[1]  Pieter Peers,et al.  Rapid Acquisition of Specular and Diffuse Normal Maps from Polarized Spherical Gradient Illumination , 2007 .

[2]  W. Heidrich,et al.  High resolution passive facial performance capture , 2010, ACM Trans. Graph..

[3]  Li Zhang,et al.  Spacetime faces: high resolution capture for modeling and animation , 2004, SIGGRAPH 2004.

[4]  Baining Guo,et al.  Geometry-driven photorealistic facial expression synthesis , 2003, IEEE Transactions on Visualization and Computer Graphics.

[5]  Josephine Sullivan,et al.  One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Ira Kemelmacher-Shlizerman,et al.  Total Moving Face Reconstruction , 2014, ECCV.

[7]  Xin Tong,et al.  Leveraging motion capture and 3D scanning for high-fidelity facial performance acquisition , 2011, ACM Trans. Graph..

[8]  Kun Zhou,et al.  Displaced dynamic expression regression for real-time facial tracking and animation , 2014, ACM Trans. Graph..

[9]  Thabo Beeler,et al.  Real-time high-fidelity facial performance capture , 2015, ACM Trans. Graph..

[10]  Mathieu Desbrun,et al.  Learning controls for blend shape based realistic facial animation , 2003, SIGGRAPH '03.

[11]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[12]  Hao Li,et al.  Realtime performance-based facial animation , 2011, ACM Trans. Graph..

[13]  Derek Bradley,et al.  An anatomically-constrained local deformation model for monocular face capture , 2016, ACM Trans. Graph..

[14]  Jihun Yu,et al.  Realtime facial animation with on-the-fly correctives , 2013, ACM Trans. Graph..

[15]  Derek Bradley,et al.  High-quality passive facial performance capture using anchor frames , 2011, ACM Trans. Graph..

[16]  Thabo Beeler,et al.  High-quality single-shot capture of facial geometry , 2010, ACM Trans. Graph..

[17]  Fernando De la Torre,et al.  Interactive region-based linear 3D face models , 2011, ACM Trans. Graph..

[18]  Marcus A. Magnor,et al.  Sparse localized deformation components , 2013, ACM Trans. Graph..

[19]  Taehyun Rhee,et al.  Real-time facial animation from live video tracking , 2011, SCA '11.

[20]  Xin Tong,et al.  Automatic acquisition of high-fidelity facial performances using monocular videos , 2014, ACM Trans. Graph..

[21]  Jinxiang Chai,et al.  Realtime 3D eye gaze animation using a single RGB camera , 2016, ACM Trans. Graph..

[22]  Yangang Wang,et al.  Online modeling for realtime facial animation , 2013, ACM Trans. Graph..

[23]  Paul E. Debevec,et al.  Multiview face capture using polarized spherical gradient illumination , 2011, ACM Trans. Graph..

[24]  Patrick Pérez,et al.  MoFA: Model-Based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Xin Tong,et al.  Accurate and Robust 3D Facial Capture Using a Single RGBD Camera , 2013, 2013 IEEE International Conference on Computer Vision.

[26]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[27]  Christian Theobalt,et al.  Reconstructing detailed dynamic face geometry from monocular video , 2013, ACM Trans. Graph..

[28]  Hans-Peter Seidel,et al.  Lightweight binocular facial performance capture under uncontrolled lighting , 2012, ACM Trans. Graph..

[29]  Jean Ponce,et al.  Dense 3D motion capture for human faces , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Christian Theobalt,et al.  Reconstruction of Personalized 3D Face Rigs from Monocular Video , 2016, ACM Trans. Graph..

[31]  Derek Bradley,et al.  Rigid stabilization of facial expressions , 2014, ACM Trans. Graph..

[32]  Patrick Pérez,et al.  Corrective 3D reconstruction of lips from monocular video , 2016, ACM Trans. Graph..

[33]  Kun Zhou,et al.  3D shape regression for real-time facial animation , 2013, ACM Trans. Graph..

[34]  Justus Thies,et al.  Face2Face: real-time face capture and reenactment of RGB videos , 2019, Commun. ACM.

[35]  Yiying Tong,et al.  FaceWarehouse: A 3D Facial Expression Database for Visual Computing , 2014, IEEE Transactions on Visualization and Computer Graphics.

[36]  Martin Klaudiny,et al.  High-Detail 3D Capture and Non-sequential Alignment of Facial Performance , 2012, 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission.

[37]  Adrien Bartoli,et al.  Segmented AAMs Improve Person-Independent Face Fitting , 2007 .

[38]  Luc Van Gool,et al.  Face/Off: live facial puppetry , 2009, SCA '09.

[39]  Jing Xiao,et al.  Vision-based control of 3D facial animation , 2003, SCA '03.

[40]  Luc Van Gool,et al.  Fast Optical Flow Using Dense Inverse Search , 2016, ECCV.