Robust Real-Time 3 D Face Tracking from RGBD Videos under Extreme Pose , Depth , and Expression Variations

We introduce a novel end-to-end real-time pose-robust 3D face tracking framework from RGBD videos, which is capable of tracking head pose and facial actions simultaneously in unconstrained environment without intervention or pre-calibration from a user. In particular, we emphasize tracking the head pose from profile to profile and improving tracking performance in challenging instances, where the tracked subject is at a considerably large distance from the camera and the quality of data deteriorates severely. To achieve these goals, the tracker is guided by an efficient multi-view 3D shape regressor, trained upon generic RGB datasets, which is able to predict model parameters despite large head rotations or tracking range. Specifically, the shape regressor is made aware of the head pose by inferring the possibility of particular facial landmarks being visible through a joint regression-classification local random forest framework, and piecewise linear regression models effectively map visibility features into shape parameters. In addition, the regressor is combined with a joint 2D+3D optimization that sparsely exploits depth information to further refine shape parameters to maintain tracking accuracy over time. The result is a robust on-line RGBD 3D face tracker that can model extreme head poses and facial expressions accurately in challenging scenes, which are demonstrated in our extensive experiments.

[1]  K. Walker,et al.  View-based active appearance models , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[2]  Dimitris N. Metaxas,et al.  Optical Flow Constraints on Deformable Models with Applications to Face Tracking , 2000, International Journal of Computer Vision.

[3]  Zhe L. Lin,et al.  Nonparametric Context Modeling of Local Appearance for Pose- and Expression-Robust Facial Landmark Localization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Timothy F. Cootes,et al.  Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..

[5]  Maja Pantic,et al.  Hierarchical On-line Appearance-Based Tracking for 3D head pose, eyebrows, lips, eyelids and irises , 2013, Image Vis. Comput..

[6]  Ben Glocker,et al.  Joint Classification-Regression Forests for Spatially Structured Multi-object Segmentation , 2012, ECCV.

[7]  Harry Shum,et al.  A Bayesian mixture model for multi-view face alignment , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Hai Xuan Pham,et al.  Hybrid On-Line 3D Face and Facial Actions Tracking in RGBD Video Sequences , 2014, 2014 22nd International Conference on Pattern Recognition.

[9]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Zhengyou Zhang,et al.  3D Deformable Face Tracking with a Commodity Depth Camera , 2010, ECCV.

[11]  Sami Romdhani,et al.  Efficient, robust and accurate fitting of a 3D morphable model , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[12]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[13]  Yangang Wang,et al.  Online modeling for realtime facial animation , 2013, ACM Trans. Graph..

[14]  Hao Li,et al.  Realtime performance-based facial animation , 2011, ACM Trans. Graph..

[15]  Kun Zhou,et al.  3D shape regression for real-time facial animation , 2013, ACM Trans. Graph..

[16]  Lijun Yin,et al.  A high-resolution 3D dynamic facial expression database , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[17]  Yiying Tong,et al.  FaceWarehouse: A 3D Facial Expression Database for Visual Computing , 2014, IEEE Transactions on Visualization and Computer Graphics.

[18]  Kok-Lim Low Linear Least-Squares Optimization for Point-to-Plane ICP Surface Registration , 2004 .

[19]  Jihun Yu,et al.  Realtime facial animation with on-the-fly correctives , 2013, ACM Trans. Graph..

[20]  Sami Romdhani,et al.  Estimating 3D shape and texture using pixel intensity, edges, specular highlights, texture constraints and a prior , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[21]  Timothy F. Cootes,et al.  Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Mandy Eberhart,et al.  Decision Forests For Computer Vision And Medical Image Analysis , 2016 .

[23]  Feng Zhou,et al.  Exemplar-Based Graph Matching for Robust Facial Landmark Localization , 2013, 2013 IEEE International Conference on Computer Vision.

[24]  Luc Van Gool,et al.  Real-time facial feature detection using conditional regression forests , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Fernando De la Torre,et al.  Global supervised descent method , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Kun Zhou,et al.  Displaced dynamic expression regression for real-time facial tracking and animation , 2014, ACM Trans. Graph..

[28]  Shuicheng Yan,et al.  Multi-view face alignment using direct appearance models , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[29]  Thomas Vetter,et al.  A morphable model for the synthesis of 3D faces , 1999, SIGGRAPH.

[30]  Guido M. Cortelazzo,et al.  Time-of-Flight Cameras and Microsoft Kinect™ , 2012, Springer Briefs in Electrical and Computer Engineering.

[31]  Pushmeet Kohli,et al.  Real-Time Face Reconstruction from a Single Depth Image , 2014, 2014 2nd International Conference on 3D Vision.

[32]  Xiangyu Zhu,et al.  Face Alignment in Full Pose Range: A 3D Total Solution , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Rin-ichiro Taniguchi,et al.  Augmented Blendshapes for Real-Time Simultaneous 3D Head Modeling and Facial Motion Capture , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Jian Sun,et al.  Face Alignment at 3000 FPS via Regressing Local Binary Features , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Chongyu Chen,et al.  Robust Performance-driven 3D Face Tracking in Long Range Depth Scenes , 2015, ArXiv.

[36]  Simon Lucey,et al.  Deformable Model Fitting by Regularized Landmark Mean-Shift , 2010, International Journal of Computer Vision.

[37]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[38]  Junzhou Huang,et al.  Pose-Free Facial Landmark Fitting via Optimized Part Mixtures and Cascaded Deformable Shape Model , 2013, 2013 IEEE International Conference on Computer Vision.

[39]  Simon Baker,et al.  Active Appearance Models Revisited , 2004, International Journal of Computer Vision.

[40]  Jian Sun,et al.  Face Alignment by Explicit Shape Regression , 2012, International Journal of Computer Vision.