Head Pose Estimation through Keypoints Matching between Reconstructed 3D Face Model and 2D Image

Mainstream methods treat head pose estimation as a supervised classification/regression problem, whose performance heavily depends on the accuracy of ground-truth labels of training data. However, it is rather difficult to obtain accurate head pose labels in practice, due to the lack of effective equipment and reasonable approaches for head pose labeling. In this paper, we propose a method which does not need to be trained with head pose labels, but matches the keypoints between a reconstructed 3D face model and the 2D input image, for head pose estimation. The proposed head pose estimation method consists of two components: the 3D face reconstruction and the 3D–2D matching keypoints. At the 3D face reconstruction phase, a personalized 3D face model is reconstructed from the input head image using convolutional neural networks, which are jointly optimized by an asymmetric Euclidean loss and a keypoint loss. At the 3D–2D keypoints matching phase, an iterative optimization algorithm is proposed to match the keypoints between the reconstructed 3D face model and the 2D input image efficiently under the constraint of perspective transformation. The proposed method is extensively evaluated on five widely used head pose estimation datasets, including Pointing’04, BIWI, AFLW2000, Multi-PIE, and Pandora. The experimental results demonstrate that the proposed method achieves excellent cross-dataset performance and surpasses most of the existing state-of-the-art approaches, with average MAEs of 4.78∘ on Pointing’04, 6.83∘ on BIWI, 7.05∘ on AFLW2000, 5.47∘ on Multi-PIE, and 5.06∘ on Pandora, although the model of the proposed method is not trained on any of these five datasets.

[1]  Tuong Le,et al.  Robust Head Pose Estimation Using Extreme Gradient Boosting Machine on Stacked Autoencoders Neural Network , 2020, IEEE Access.

[2]  Xi Zhou,et al.  Joint 3D Face Reconstruction and Dense Alignment with Position Map Regression Network , 2018, ECCV.

[3]  J. J. Moré,et al.  Levenberg--Marquardt algorithm: implementation and theory , 1977 .

[4]  Jianfei Cai,et al.  CNN-Based Real-Time Dense Face Reconstruction with Inverse-Rendered Photo-Realistic Face Images , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Yiying Tong,et al.  FaceWarehouse: A 3D Facial Expression Database for Visual Computing , 2014, IEEE Transactions on Visualization and Computer Graphics.

[6]  In-So Kweon,et al.  Real-Time Head Orientation from a Monocular Camera Using Deep Neural Network , 2014, ACCV.

[7]  Xiaohui Yuan,et al.  Conditional convolution neural network enhanced random forest for facial expression recognition , 2018, Pattern Recognit..

[8]  Michele Nappi,et al.  Web-Shaped Model for Head Pose Estimation: An Approach for Best Exemplar Selection , 2020, IEEE Transactions on Image Processing.

[9]  Josephine Sullivan,et al.  One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Mohan M. Trivedi,et al.  Head Pose Estimation in Computer Vision: A Survey , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Luc Van Gool,et al.  Random Forests for Real Time 3D Face Analysis , 2012, International Journal of Computer Vision.

[12]  Angelo Cangelosi,et al.  Head pose estimation in the wild using Convolutional Neural Networks and adaptive gradient methods , 2017, Pattern Recognit..

[13]  Andrea F. Abate,et al.  Near Real-Time Three Axis Head Pose Estimation Without Training , 2019, IEEE Access.

[14]  M. Amaç Güvensan,et al.  Driver Behavior Analysis for Safe Driving: A Survey , 2015, IEEE Transactions on Intelligent Transportation Systems.

[15]  Tony R. Martinez,et al.  Improving classification accuracy by identifying and removing instances that should be misclassified , 2011, The 2011 International Joint Conference on Neural Networks.

[16]  Radu Horaud,et al.  Robust Head-Pose Estimation Based on Partially-Latent Mixture of Linear Regressions , 2016, IEEE Transactions on Image Processing.

[17]  Sheng Wan,et al.  QuatNet: Quaternion-Based Head Pose Estimation With Multiregression Loss , 2019, IEEE Transactions on Multimedia.

[18]  Jean-Marc Odobez,et al.  HeadFusion: 360° Head Pose Tracking Combining 3D Morphable Model and 3D Reconstruction , 2018, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Takeo Kanade,et al.  Multi-PIE , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[20]  Ahmed Ghoneim,et al.  Head Pose Estimation on Top of Haar-Like Face Detection: A Study Using the Kinect Sensor , 2015, Sensors.

[21]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Rainer Stiefelhagen,et al.  Real Time Head Model Creation and Head Pose Estimation on Consumer Depth Cameras , 2014, 2014 2nd International Conference on 3D Vision.

[23]  Luc Van Gool,et al.  Real-time facial feature detection using conditional regression forests , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Kun Zhang,et al.  Robust head pose estimation using Dirichlet-tree distribution enhanced random forests , 2016, Neurocomputing.

[25]  Radu Horaud,et al.  Head pose estimation via probabilistic high-dimensional regression , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[26]  Xin Geng,et al.  Head Pose Estimation Based on Multivariate Label Distribution , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Horst Bischof,et al.  Annotated Facial Landmarks in the Wild: A large-scale, real-world database for facial landmark localization , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[28]  Kun Zhang,et al.  A hybrid intelligence-aided approach to affect-sensitive e-learning , 2014, Computing.

[29]  Takeshi Saitoh,et al.  Head Pose Estimation Using Convolutional Neural Network , 2018 .

[30]  Shiguang Shan,et al.  CovGa: A novel descriptor based on symmetry of regions for head pose estimation , 2014, Neurocomputing.

[31]  D. Laurendeau,et al.  Highly Accurate and Fully Automatic 3–D Head Pose Estimation and Eye Gaze Estimation Using RGB-–D Sensors and 3D Morphable Models , 2018 .

[32]  J. Crowley,et al.  Estimating Face orientation from Robust Detection of Salient Facial Structures , 2004 .

[33]  Xiaoming Liu,et al.  Face Alignment in Full Pose Range: A 3D Total Solution , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Xiangyu Zhu,et al.  Face Alignment in Full Pose Range: A 3D Total Solution , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Ramakant Nevatia,et al.  FacePoseNet: Making a Case for Landmark-Free Face Alignment , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[36]  Hossein Ebrahimnezhad,et al.  Head pose estimation based on fuzzy systems using facial geometric features , 2016, 2016 8th International Symposium on Telecommunications (IST).

[37]  Luc Van Gool,et al.  Real Time Head Pose Estimation from Consumer Depth Cameras , 2011, DAGM-Symposium.

[38]  Sami Romdhani,et al.  A 3D Face Model for Pose and Illumination Invariant Face Recognition , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[39]  James L. Crowley,et al.  Head Pose Estimation Using Multi-scale Gaussian Derivatives , 2013, SCIA.

[40]  Xiaohui Yuan,et al.  Multi-level structured hybrid forest for joint head detection and pose estimation , 2017, Neurocomputing.

[41]  Pi-Cheng Hsiu,et al.  SSR-Net: A Compact Soft Stagewise Regression Network for Age Estimation , 2018, IJCAI.

[42]  Qiang Ji,et al.  Human Computer Interaction with Head Pose, Eye Gaze and Body Gestures , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[43]  Tal Hassner,et al.  Regressing Robust and Discriminative 3D Morphable Models with a Very Deep Neural Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Yung-Yu Chuang,et al.  FSA-Net: Learning Fine-Grained Structure Aggregation for Head Pose Estimation From a Single Image , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Li Zhang,et al.  Progressive Pose Normalization Generative Adversarial Network for Frontal Face Synthesis and Face Recognition Under Large Pose , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[46]  Rita Cucchiara,et al.  POSEidon: Face-from-Depth for Driver Pose Estimation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Li Zhang,et al.  Real-time pose invariant spontaneous smile detection using conditional random regression forests , 2019, Optik.

[48]  Stefanos Zafeiriou,et al.  GANFIT: Generative Adversarial Network Fitting for High Fidelity 3D Face Reconstruction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  In-So Kweon,et al.  Real-time head pose estimation using multi-task deep neural network , 2018, Robotics Auton. Syst..

[50]  Jan Kautz,et al.  Robust Model-Based 3D Head Pose Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[51]  Shengcai Liao,et al.  Learning Face Representation from Scratch , 2014, ArXiv.

[52]  Wei Liang,et al.  3D head pose estimation with convolutional neural network trained on synthetic images , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[53]  Jin Han,et al.  Head posture detection with embedded attention model , 2020, IOP Conference Series: Materials Science and Engineering.

[54]  Rama Chellappa,et al.  KEPLER: Keypoint and Pose Estimation of Unconstrained Faces by Learning Efficient H-CNN Regressors , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[55]  Xiongkuo Min,et al.  LPHD: A Large-Scale Head Pose Dataset for RGB Images , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[56]  James M. Rehg,et al.  Fine-Grained Head Pose Estimation Without Keypoints , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[57]  Cheng Cheng,et al.  A Deep Regression Architecture with Two-Stage Re-initialization for High Performance Facial Landmark Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Mao Ye,et al.  Head Pose Estimation Based on Robust Convolutional Neural Network , 2016 .

[59]  Luc Van Gool,et al.  Real time head pose estimation with random regression forests , 2011, CVPR 2011.

[60]  Junchul Chun,et al.  3D face pose estimation by a robust real time tracking of facial features , 2014, Multimedia Tools and Applications.

[61]  Peng Wang,et al.  Appearance based pedestrians' head pose and body orientation estimation using deep learning , 2018, Neurocomputing.

[62]  Michael J. Jones,et al.  Real-time 3D head pose and facial landmark estimation from depth images using triangular surface patch features , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Denis Laurendeau,et al.  Highly Accurate and Fully Automatic 3D Head Pose Estimation and Eye Gaze Estimation Using RGB-D Sensors and 3D Morphable Models , 2018, Sensors.

[64]  Jingying Chen,et al.  Head pose estimation with soft labels using regularized convolutional neural network , 2019, Neurocomputing.

[65]  Mahadev Satyanarayanan,et al.  OpenFace: A general-purpose face recognition library with mobile applications , 2016 .

[66]  Georgios Tzimiropoulos,et al.  How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks) , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).