Cascaded Pose Regression Revisited: Face Alignment in Videos

Automated pose estimation is a fundamental task in computer vision. In this paper, we investigate the generic framework of Cascaded Pose Regression (CPR), which demonstrates practical effectiveness in pose estimation on deformable and articulated objects. In particular, we focus on the use of CPR for face alignment by exploring existing techniques and verifying their performances on different public facial datasets. We show that the correct selection of pose-invariant features is critical to encode the geometric arrangement of landmarks and crucial for the overall regressor learnability. Furthermore, by incorporating strategies that are commonly used among the state-of-the-art, we interpret the CPR training procedure as a repeated clustering problem with explicit regressor representation, which is complementary to the original CPR algorithm. In our experiment, the qualitative evaluation of existing alignment techniques demonstrates the success of CPR for facial pose inference that can be conveniently adopted to video detection and tracking applications.

[1]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Takeo Kanade,et al.  Dense 3D face alignment from 2D videos in real-time , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[3]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[4]  Kun Zhou,et al.  3D shape regression for real-time facial animation , 2013, ACM Trans. Graph..

[5]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[6]  Ju Shen,et al.  Image-based indoor place-finder using image to plane matching , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[7]  Rafael A. Calvo,et al.  Automated Detection of Engagement Using Video-Based Estimation of Facial Expressions and Heart Rate , 2017, IEEE Transactions on Affective Computing.

[8]  Simon Lucey,et al.  Deformable Model Fitting by Regularized Landmark Mean-Shift , 2010, International Journal of Computer Vision.

[9]  Jian Sun,et al.  Face Alignment at 3000 FPS via Regressing Local Binary Features , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Sin-Hwa Kang,et al.  Social copresence in anonymous social interactions using a mobile video telephone , 2008, CHI.

[11]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[12]  Lin Chen,et al.  Face alignment under occlusion based on local and global feature regression , 2016, Multimedia Tools and Applications.

[13]  Pietro Perona,et al.  Cascaded pose regression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Vincent Lepetit,et al.  Fast Keypoint Recognition Using Random Ferns , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  P. Ekman,et al.  Facial action coding system: a technique for the measurement of facial movement , 1978 .

[16]  Timothy F. Cootes,et al.  Accurate Regression Procedures for Active Appearance Models , 2011, BMVC.

[17]  Roland Göcke,et al.  A Nonlinear Discriminative Approach to AAM Fitting , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[18]  Margaret McRorie,et al.  Evaluation of four designed virtual agents personalities (formerly building and evaluating personality in virtual agents) , 2011 .

[19]  David J. Kriegman,et al.  Localizing parts of faces using a consensus of exemplars , 2011, CVPR.

[20]  Maja Pantic,et al.  Gauss-Newton Deformable Part Models for Face Alignment In-the-Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Ju Shen,et al.  A robust RGB-D SLAM system for 3D environment with planar surfaces , 2013, 2013 IEEE International Conference on Image Processing.

[22]  Jian Sun,et al.  Cascaded hand pose regression , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Jian Sun,et al.  Face Alignment by Explicit Shape Regression , 2012, International Journal of Computer Vision.

[24]  Ju Shen,et al.  Layer Depth Denoising and Completion for Structured-Light RGB-D Cameras , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Pietro Perona,et al.  Robust Face Landmark Estimation under Occlusion , 2013, 2013 IEEE International Conference on Computer Vision.

[26]  Stefanos Zafeiriou,et al.  Robust Discriminative Response Map Fitting with Constrained Local Models , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.