Head Pose Estimation via Multi-Task Cascade CNN

In our daily life, many face applications need to complete three tasks: face detection, facial landmark localization and head pose estimation. Currently, most methods accomplish these three tasks separately. Multi-task cascade convolution neural network(MTCNN) adpots the idea of casecading that combines face detection and face alignment. Inspired by MTCNN, we combine the three tasks of face detection, head pose estimation and key points detection under a cascade framework. Simultaneously, we increased the number of key points detected by MTCNN from 5 to 21. By training and testing our model on the WIDER and Umdfaces datasets, we explored the inherent correlation between these three facial tasks and demonstrated the excellent results of the model tested in an unconstrained environment.

[1]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[2]  Rama Chellappa,et al.  Automatic head pose estimation using randomly projected dense SIFT descriptors , 2012, 2012 19th IEEE International Conference on Image Processing.

[3]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[4]  Xu Guangyou Head pose estimation based on a second order histogram of the orientation gradient , 2011 .

[5]  Shaogang Gong,et al.  Support vector machine based multi-view face detection and recognition , 2004, Image Vis. Comput..

[6]  Carlos D. Castillo,et al.  UMDFaces: An annotated face dataset for training deep networks , 2016, 2017 IEEE International Joint Conference on Biometrics (IJCB).

[7]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[8]  Azriel Rosenfeld,et al.  Face recognition: A literature survey , 2003, CSUR.

[9]  James M. Rehg,et al.  Fine-Grained Head Pose Estimation Without Keypoints , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[10]  Shaogang Gong,et al.  Support vector regression and classification based multi-view face detection and recognition , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[11]  Gang Hua,et al.  A convolutional neural network cascade for face detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Huaizu Jiang,et al.  Face Detection with the Faster R-CNN , 2016, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[13]  Mohan M. Trivedi,et al.  Head Pose Estimation in Computer Vision: A Survey , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Dan Schonfeld,et al.  A Particle Filtering Framework for Joint Video Tracking and Pose Estimation , 2010, IEEE Transactions on Image Processing.

[15]  Gengming Zhu,et al.  Joint Face Detection and Facial Expression Recognition with MTCNN , 2017, 2017 4th International Conference on Information Science and Control Engineering (ICISCE).

[16]  Tat-Jen Cham,et al.  Fast polygonal integration and its application in extending haar-like features to improve object detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Shuo Yang,et al.  WIDER FACE: A Face Detection Benchmark , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).