Robust Facial Landmark Detection via Recurrent Attentive-Refinement Networks

In this work, we introduce a novel Recurrent Attentive-Refinement (RAR) network for facial landmark detection under unconstrained conditions, suffering from challenges like facial occlusions and/or pose variations. RAR follows the pipeline of cascaded regressions that refines landmark locations progressively. However, instead of updating all the landmark locations together, RAR refines the landmark locations sequentially at each recurrent stage. In this way, more reliable landmark points are refined earlier and help to infer locations of other challenging landmarks that may stay with occlusions and/or extreme poses. RAR can thus effectively control detection errors from those challenging landmarks and improve overall performance even in presence of heavy occlusions and/or extreme conditions. To determine the sequence of landmarks, RAR employs an attentive-refinement mechanism. The attention LSTM (A-LSTM) and refinement LSTM (R-LSTM) models are introduced in RAR. At each recurrent stage, A-LSTM implicitly identifies a reliable landmark as the attention center. Following the sequence of attention centers, R-LSTM sequentially refines the landmarks near or correlated with the attention centers and provides ultimate detection results finally. To further enhance algorithmic robustness, instead of using mean shape for initialization, RAR adaptively determines the initialization by selecting from a pool of shape centers clustered from all training shapes. As an end-to-end trainable model, RAR demonstrates superior performance in detecting challenging landmarks in comprehensive experiments and it also establishes new state-of-the-arts on the 300-W, COFW and AFLW benchmark datasets.

[1]  Cheng Li,et al.  Face alignment by coarse-to-fine shape searching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Azriel Rosenfeld,et al.  Face recognition: A literature survey , 2003, CSUR.

[3]  Horst Bischof,et al.  Annotated Facial Landmarks in the Wild: A large-scale, real-world database for facial landmark localization , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[4]  Kun Zhou,et al.  Displaced dynamic expression regression for real-time facial tracking and animation , 2014, ACM Trans. Graph..

[5]  Ira Kemelmacher-Shlizerman,et al.  Illumination-Aware Age Progression , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Rui Caseiro,et al.  Generative face alignment through 2.5D active appearance models , 2013, Comput. Vis. Image Underst..

[7]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[8]  Xiaogang Wang,et al.  Deep Convolutional Network Cascade for Facial Point Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Pietro Perona,et al.  Cascaded pose regression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Donghoon Lee,et al.  Face alignment using cascade Gaussian process regression trees , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Simon Lucey,et al.  Deformable Model Fitting by Regularized Landmark Mean-Shift , 2010, International Journal of Computer Vision.

[12]  Jian Sun,et al.  Face Alignment at 3000 FPS via Regressing Local Binary Features , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Shuicheng Yan,et al.  Towards Multi-view and Partially-Occluded Face Alignment , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Shuicheng Yan,et al.  "Wow! You Are So Beautiful Today!" , 2014, ACM Trans. Multim. Comput. Commun. Appl..

[16]  Pietro Perona,et al.  Robust Face Landmark Estimation under Occlusion , 2013, 2013 IEEE International Conference on Computer Vision.

[17]  Ioannis Patras,et al.  Robust Face Alignment Under Occlusion via Regional Predictive Power Estimation , 2015, IEEE Transactions on Image Processing.

[18]  C. Taylor,et al.  Accurate Regression Procedures for Active Appearance Models , 2011, BMVC 2011.

[19]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[20]  Xiaoou Tang,et al.  Learning Deep Representation for Face Alignment with Auxiliary Attributes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[23]  Shiguang Shan,et al.  Coarse-to-Fine Auto-Encoder Networks (CFAN) for Real-Time Face Alignment , 2014, ECCV.

[24]  Jian Sun,et al.  Face Alignment by Explicit Shape Regression , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Charless C. Fowlkes,et al.  Occlusion Coherence: Localizing Occluded Faces with a Hierarchical Deformable Part Model , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Stefanos Zafeiriou,et al.  300 Faces in-the-Wild Challenge: The First Facial Landmark Localization Challenge , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[27]  Xiaogang Wang,et al.  Hierarchical face parsing via deep learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[29]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.