Real-Time Facial Motion Capture Using RGB-D Images Under Complex Motion and Occlusions

We present a technique for capturing facial performance in real time using an RGB-D camera. Such method can be applied to face augmentation by leveraging facial expression changes. The technique is able to perform both 3D facial modeling and facial motion tracking without the need of pre-scanning or training for a specific user. The proposed approach builds on an existing method that we refer as FaceCap, which uses a blendshape representation and a Bump image for tracking facial motion and capturing geometric details. The original FaceCap algorithm fails in some scenarios with complex motion and occlusions, mainly due to problems in the face detection and tracking steps. FaceCap also has problems with the Bump image filtering step that generates outliers, causing more distortion on the 3D augmented blendshape. In order to solve these problems, we propose two refinements: (a) a new framework for face detection and landmark localization based on the state-of-the-art methods MTCNN and CE-CLM, respectively; and (b) a simple but effective modification in the filtering step, removing reconstruction failures in the eye region. Experiments showed that the proposed approach can deal with unconstrained scenarios, such as large head pose variations and partial occlusions, while achieving real-time execution.

[1]  K. Walker,et al.  View-based active appearance models , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[2]  Feng Liu,et al.  Joint Face Alignment and 3D Face Reconstruction , 2016, ECCV.

[3]  Louis-Philippe Morency,et al.  OpenFace 2.0: Facial Behavior Analysis Toolkit , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[4]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Yiying Tong,et al.  FaceWarehouse: A 3D Facial Expression Database for Visual Computing , 2014, IEEE Transactions on Visualization and Computer Graphics.

[6]  Xiaoming Liu,et al.  Face Alignment in Full Pose Range: A 3D Total Solution , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Li-Jia Li,et al.  Multi-view Face Detection Using Deep Convolutional Neural Networks , 2015, ICMR.

[8]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[9]  Vladlen Koltun,et al.  Elastic Fragments for Dense Scene Reconstruction , 2013, 2013 IEEE International Conference on Computer Vision.

[10]  Simon Baker,et al.  Active Appearance Models Revisited , 2004, International Journal of Computer Vision.

[11]  Jongmoo Choi,et al.  Laser scan quality 3-D face modeling using a low-cost depth camera , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[12]  Timothy F. Cootes,et al.  Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..

[13]  Xin Tong,et al.  Accurate and Robust 3D Facial Capture Using a Single RGBD Camera , 2013, 2013 IEEE International Conference on Computer Vision.

[14]  Tal Hassner,et al.  Extreme 3D Face Reconstruction: Seeing Through Occlusions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Rin-ichiro Taniguchi,et al.  Augmented Blendshapes for Real-Time Simultaneous 3D Head Modeling and Facial Motion Capture , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Timothy F. Cootes,et al.  Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Yangang Wang,et al.  Online modeling for realtime facial animation , 2013, ACM Trans. Graph..

[18]  Peter Robinson,et al.  3D Constrained Local Model for rigid and non-rigid facial tracking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Gernot A. Fink,et al.  Face Detection Using GPU-Based Convolutional Neural Networks , 2009, CAIP.

[20]  Shiguang Shan,et al.  Coarse-to-Fine Auto-Encoder Networks (CFAN) for Real-Time Face Alignment , 2014, ECCV.

[21]  Peter Robinson,et al.  Constrained Local Neural Fields for Robust Facial Landmark Detection in the Wild , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[22]  Akihiro Sugimoto,et al.  A Flexible Scene Representation for 3D Reconstruction Using an RGB-D Camera , 2013, 2013 IEEE International Conference on Computer Vision.

[23]  Veronica Teichrieb,et al.  Interactive Makeup Tutorial Using Face Tracking and Augmented Reality on Mobile Devices , 2015, 2015 XVII Symposium on Virtual and Augmented Reality.

[24]  Thomas S. Huang,et al.  Human face detection in a complex background , 1994, Pattern Recognit..

[25]  Patrick Pérez,et al.  State of the Art on Monocular 3D Face Reconstruction, Tracking, and Applications , 2018, Comput. Graph. Forum.

[26]  Timothy F. Cootes,et al.  Feature Detection and Tracking with Constrained Local Models , 2006, BMVC.

[27]  Klaus J. Kirchberg,et al.  Robust Face Detection Using the Hausdorff Distance , 2001, AVBPA.

[28]  Marc Levoy,et al.  Efficient variants of the ICP algorithm , 2001, Proceedings Third International Conference on 3-D Digital Imaging and Modeling.

[29]  Hao Li,et al.  Realtime performance-based facial animation , 2011, ACM Trans. Graph..

[30]  Louis-Philippe Morency,et al.  Convolutional Experts Constrained Local Model for 3D Facial Landmark Detection , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[31]  Ramakant Nevatia,et al.  ExpNet: Landmark-Free, Deep, 3D Facial Expressions , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[32]  Anil K. Jain,et al.  Face Detection in Color Images , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Sridha Sridharan,et al.  Efficient constrained local model fitting for non-rigid face alignment , 2009, Image Vis. Comput..

[34]  Jihun Yu,et al.  Unconstrained realtime facial performance capture , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Ronald Azuma,et al.  A Survey of Augmented Reality , 1997, Presence: Teleoperators & Virtual Environments.

[36]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[37]  Harry Shum,et al.  Face alignment using texture-constrained active shape models , 2003, Image Vis. Comput..

[38]  Tal Hassner,et al.  Regressing Robust and Discriminative 3D Morphable Models with a Very Deep Neural Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Zhengyou Zhang,et al.  Improving multiview face detection with multi-task deep convolutional neural networks , 2014, IEEE Winter Conference on Applications of Computer Vision.

[40]  Jian Sun,et al.  Face Alignment at 3000 FPS via Regressing Local Binary Features , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.