Region-Based Context Enhanced Network for Robust Multiple Face Alignment

The recent studies for face alignment have involved developing an isolated algorithm on well-cropped face images. It is difficult to obtain the expected input by using an off-the-shelf face detector in practical applications. In this paper, we attempt to bridge between face detection and face alignment by establishing a novel joint multi-task model, which allows us to simultaneously detect multiple faces and their landmarks on a given scene image. In contrast to the pipeline-based framework by cascading separate models, we aim to propose an end-to-end convolutional network by sharing and transform feature representations between the task-specific modules. To learn a robust landmark estimator for unconstrained face alignment, three types of context enhanced blocks are designed to encode feature maps with multi-level context, multi-scale context, and global context. In the post-processing step, we develop a shape reconstruction algorithm based on point distribution model to refine the landmark outliers. Extensive experiments demonstrate that our results are robust for the landmark location task and insensitive to the location of estimated face regions. Furthermore, our method significantly outperforms recent state-of-the-art methods on several challenging datasets including 300 W, AFLW, and COFW.

[1]  Shuo Yang,et al.  WIDER FACE: A Face Detection Benchmark , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Jiwen Lu,et al.  Learning Cascaded Deep Auto-Encoder Networks for Face Alignment , 2016, IEEE Transactions on Multimedia.

[3]  Timothy F. Cootes,et al.  Feature Detection and Tracking with Constrained Local Models , 2006, BMVC.

[4]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Shiguang Shan,et al.  Occlusion-Free Face Alignment: Deep Regression Networks Coupled with De-Corrupt AutoEncoders , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Shiguang Shan,et al.  Coarse-to-Fine Auto-Encoder Networks (CFAN) for Real-Time Face Alignment , 2014, ECCV.

[7]  Gee-Sern Hsu,et al.  Fast Landmark Localization With 3D Component Reconstruction and CNN for Cross-Pose Recognition , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Jian Sun,et al.  Face Alignment by Explicit Shape Regression , 2012, International Journal of Computer Vision.

[11]  Cheng Cheng,et al.  A Deep Regression Architecture with Two-Stage Re-initialization for High Performance Facial Landmark Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Kilian Q. Weinberger,et al.  Multi-Scale Dense Networks for Resource Efficient Image Classification , 2017, ICLR.

[15]  Xiangyu Zhang,et al.  Large Kernel Matters — Improve Semantic Segmentation by Global Convolutional Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Xiaoming Liu,et al.  Pose-Invariant 3D Face Alignment , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[17]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[18]  Iasonas Kokkinos,et al.  DenseReg: Fully Convolutional Dense Shape Regression In-the-Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Ioannis Patras,et al.  Robust Face Alignment Under Occlusion via Regional Predictive Power Estimation , 2015, IEEE Transactions on Image Processing.

[20]  Timothy F. Cootes,et al.  Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Michel F. Valstar,et al.  L2, 1-based regression and prediction accumulation across views for robust facial landmark detection , 2016, Image Vis. Comput..

[22]  George Trigeorgis,et al.  Mnemonic Descent Method: A Recurrent Process Applied for End-to-End Face Alignment , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Georgios Tzimiropoulos,et al.  How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks) , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Jian Sun,et al.  Face Alignment at 3000 FPS via Regressing Local Binary Features , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Ying Wu,et al.  Detecting and Aligning Faces by Image Retrieval , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Cheng Li,et al.  Face alignment by coarse-to-fine shape searching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  George Trigeorgis,et al.  Joint Multi-View Face Alignment in the Wild , 2017, IEEE Transactions on Image Processing.

[30]  Qingshan Liu,et al.  M3 CSR: Multi-view, multi-scale and multi-component cascade shape regression , 2016, Image Vis. Comput..

[31]  Stefanos Zafeiriou,et al.  300 Faces In-The-Wild Challenge: database and results , 2016, Image Vis. Comput..

[32]  Larry S. Davis,et al.  SSH: Single Stage Headless Face Detector , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[33]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[34]  Charless C. Fowlkes,et al.  Occlusion Coherence: Localizing Occluded Faces with a Hierarchical Deformable Part Model , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[36]  Xiaoming Liu,et al.  Pose-Invariant Face Alignment with a Single CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37]  Cheng Li,et al.  Unconstrained Face Alignment via Cascaded Compositional Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Marios Savvides,et al.  CMS-RCNN: Contextual Multi-Scale Region-based CNN for Unconstrained Face Detection , 2016, ArXiv.

[39]  Wei Liu,et al.  ParseNet: Looking Wider to See Better , 2015, ArXiv.

[40]  Qingming Huang,et al.  Face Distortion Recovery Based on Online Learning Database for Conversational Video , 2014, IEEE Transactions on Multimedia.

[41]  Jian Sun,et al.  Joint Cascade Face Detection and Alignment , 2014, ECCV.

[42]  Stefanos Zafeiriou,et al.  Menpo: A Comprehensive Platform for Parametric Image Alignment and Visual Deformable Models , 2014, ACM Multimedia.

[43]  Qiang Ji,et al.  Robust Facial Landmark Detection Under Significant Head Poses and Occlusion , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[44]  Xiaoming Liu,et al.  Dense Face Alignment , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[45]  David J. Kriegman,et al.  Localizing parts of faces using a consensus of exemplars , 2011, CVPR.

[46]  Horst Bischof,et al.  Annotated Facial Landmarks in the Wild: A large-scale, real-world database for facial landmark localization , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[47]  Gerhard Rigoll,et al.  Robust Facial Landmark Detection via a Fully-Convolutional Local-Global Context Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  Ashraf A. Kassim,et al.  Recurrent 3D-2D Dual Learning for Large-Pose Facial Landmark Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[49]  Hanjiang Lai,et al.  Robust Facial Landmark Detection via Recurrent Attentive-Refinement Networks , 2016, ECCV.

[50]  Gang Hua,et al.  Supervised Transformer Network for Efficient Face Detection , 2016, ECCV.

[51]  Haoqiang Fan,et al.  Approaching human level facial landmark localization by deep learning , 2016, Image Vis. Comput..

[52]  Xu Tang,et al.  PyramidBox: A Context-assisted Single Shot Face Detector , 2018, ECCV.

[53]  Hao Wang,et al.  Face R-CNN , 2017, ArXiv.

[54]  Shifeng Zhang,et al.  S^3FD: Single Shot Scale-Invariant Face Detector , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[55]  Thomas S. Huang,et al.  Interactive Facial Feature Localization , 2012, ECCV.

[56]  Pietro Perona,et al.  Robust Face Landmark Estimation under Occlusion , 2013, 2013 IEEE International Conference on Computer Vision.

[57]  William J. Christmas,et al.  Cascaded Collaborative Regression for Robust Facial Landmark Detection Trained Using a Mixture of Synthetic and Real Images With Dynamic Weighting , 2015, IEEE Transactions on Image Processing.

[58]  C. Taylor,et al.  Active shape models - 'Smart Snakes'. , 1992 .

[59]  Rama Chellappa,et al.  HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60]  Yorgos Tzimiropoulos,et al.  Bulat , Adrian and Tzimiropoulos , Georgios ( 2016 ) Convolutional aggregation of local evidence for large pose face alignment , 2017 .

[61]  William J. Christmas,et al.  Dynamic Attention-Controlled Cascaded Shape Regression Exploiting Training Data Augmentation and Fuzzy-Set Sample Weighting , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63]  Qiang Ji,et al.  Simultaneous Facial Landmark Detection, Pose and Deformation Estimation Under Facial Occlusion , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Peiyun Hu,et al.  Finding Tiny Faces , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[66]  Chung-Hsien Wu,et al.  Two-Level Hierarchical Alignment for Semi-Coupled HMM-Based Audiovisual Emotion Recognition With Temporal Course , 2013, IEEE Transactions on Multimedia.

[67]  Qingshan Liu,et al.  Stacked Hourglass Network for Robust Facial Landmark Localisation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[68]  Xiaoming Liu,et al.  Large-Pose Face Alignment via CNN-Based Dense 3D Model Fitting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Yihong Gong,et al.  Face alignment recurrent network , 2018, Pattern Recognit..

[70]  Rama Chellappa,et al.  KEPLER: Keypoint and Pose Estimation of Unconstrained Faces by Learning Efficient H-CNN Regressors , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[71]  Xiaoou Tang,et al.  Learning Deep Representation for Face Alignment with Auxiliary Attributes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[72]  Junzhou Huang,et al.  Pose-Free Facial Landmark Fitting via Optimized Part Mixtures and Cascaded Deformable Shape Model , 2013, 2013 IEEE International Conference on Computer Vision.

[73]  Zhenan Sun,et al.  Combining Data-Driven and Model-Driven Methods for Robust Facial Landmark Detection , 2016, IEEE Transactions on Information Forensics and Security.

[74]  Xiaogang Wang,et al.  Deep Convolutional Network Cascade for Facial Point Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[75]  Kavita Bala,et al.  Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[76]  Ran Tao,et al.  Seeing Small Faces from Robust Anchor's Perspective , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[77]  J. Fernando Sánchez-Rada,et al.  MixedEmotions: An Open-Source Toolbox for Multimodal Emotion Analysis , 2018, IEEE Transactions on Multimedia.

[78]  Václav Hlavác,et al.  Real-time multi-view facial landmark detector learned by the structured output SVM , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[79]  Jean Meunier,et al.  Prototype-Based Modeling for Facial Expression Analysis , 2014, IEEE Transactions on Multimedia.

[80]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[81]  Josephine Sullivan,et al.  One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[82]  Jiří Matas,et al.  Multi-view facial landmark detection by using a 3D shape model , 2016, Image Vis. Comput..

[83]  Simon Baker,et al.  Active Appearance Models Revisited , 2004, International Journal of Computer Vision.

[84]  Yi Yang,et al.  Style Aggregated Network for Facial Landmark Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[85]  Stefanos Zafeiriou,et al.  300 Faces in-the-Wild Challenge: The First Facial Landmark Localization Challenge , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[86]  Yi Yang,et al.  Supervision-by-Registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[87]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.