Robust head pose estimation based on key frames for human-machine interaction

Humans can interact with several kinds of machine (motor vehicle, robots, among others) in different ways. One way is through his/her head pose. In this work, we propose a head pose estimation framework that combines 2D and 3D cues using the concept of key frames (KFs). KFs are a set of frames learned automatically offline that consist the following: 2D features, encoded through Speeded Up Robust Feature (SURF) descriptors; 3D information, captured by Fast Point Feature Histogram (FPFH) descriptors; and target’s head orientation (pose) in real-world coordinates, which is represented through a 3D facial model. Then, the KF information is re-enforced through a global optimization process that minimizes error in a way similar to bundle adjustment. The KF allows to formulate, in an online process, a hypothesis of the head pose in new images that is then refined through an optimization process, performed by the iterative closest point (ICP) algorithm. This KF-based framework can handle partial occlusions and extreme rotations even with noisy depth data, improving the accuracy of pose estimation and detection rate. We evaluate the proposal using two public benchmarks in the state of the art: (1) BIWI Kinect Head Pose Database and (2) ICT 3D HeadPose Database. In addition, we evaluate this framework with a small but challenging dataset of our own authorship where the targets perform more complex behaviors than those in the aforementioned public datasets. We show how our approach outperforms relevant state-of-the-art proposals on all these datasets.

[1]  Luc Van Gool,et al.  Real-time face pose estimation from single range images , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  King Ngi Ngan,et al.  Real-Time Head Pose Tracking with Online Face Template Reconstruction , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Qionghai Dai,et al.  Cross-Modality Bridging and Knowledge Transferring for Image Understanding , 2019, IEEE Transactions on Multimedia.

[4]  Vincent Lepetit,et al.  Keyframe-based modeling and tracking of multiple 3D objects , 2010, 2010 IEEE International Symposium on Mixed and Augmented Reality.

[5]  Mohan M. Trivedi,et al.  Head Pose Estimation and Augmented Reality Tracking: An Integrated System and Evaluation for Monitoring Driver Awareness , 2010, IEEE Transactions on Intelligent Transportation Systems.

[6]  Denis Laurendeau,et al.  Highly Accurate and Fully Automatic Head Pose Estimation from a Low Quality Consumer-Level RGB-D Sensor , 2015, HCMC '15.

[7]  Radu Horaud,et al.  Head pose estimation via probabilistic high-dimensional regression , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[8]  Vincent Lepetit,et al.  Stable real-time 3D tracking using online and offline information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Ioannis Patras,et al.  Exploiting Depth and Intensity Information for Head Pose Estimation with Random Forests and Tensor Models , 2012, ACCV Workshops.

[10]  Yongdong Zhang,et al.  A Fast Uyghur Text Detector for Complex Background Images , 2018, IEEE Transactions on Multimedia.

[11]  Wladyslaw Skarbek,et al.  Head Pose Tracking from RGBD Sensor Based on Direct Motion Estimation , 2015, PReMI.

[12]  Yongdong Zhang,et al.  Efficient Parallel Framework for HEVC Motion Estimation on Many-Core Processors , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[13]  Nico Blodow,et al.  Fast Point Feature Histograms (FPFH) for 3D registration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[14]  Michael J. Jones,et al.  Real-time 3D head pose and facial landmark estimation from depth images using triangular surface patch features , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Josephine Sullivan,et al.  One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Trevor Darrell,et al.  Pose estimation using 3D view-based eigenspaces , 2003, 2003 IEEE International SOI Conference. Proceedings (Cat. No.03CH37443).

[17]  Mohan M. Trivedi,et al.  Head Pose Estimation in Computer Vision: A Survey , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Yongdong Zhang,et al.  STAT: Spatial-Temporal Attention Mechanism for Video Captioning , 2020, IEEE Transactions on Multimedia.

[19]  Adam Strupczewski,et al.  High Accuracy Head Pose Tracking Survey , 2014, AMT.

[20]  Francisco Madrigal,et al.  3D Head Pose Estimation Enhanced Through SURF-Based Key-Frames , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[21]  IEEE conference on computer vision and pattern recognition , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[22]  Simon Lucey,et al.  Deformable Model Fitting by Regularized Landmark Mean-Shift , 2010, International Journal of Computer Vision.

[23]  Ayoub Al-Hamadi,et al.  Boosted human head pose estimation using kinect camera , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[24]  Éric Marchand,et al.  Pose Estimation for Augmented Reality: A Hands-On Survey , 2016, IEEE Transactions on Visualization and Computer Graphics.

[25]  Christian Huitema,et al.  Real-time 3D face tracking based on active appearance model constrained by depth data , 2014, Image Vis. Comput..

[26]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[27]  Wei Liang,et al.  Face pose estimation with combined 2D and 3D HOG features , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[28]  Nicu Sebe,et al.  Combining Head Pose and Eye Location Information for Gaze Estimation , 2012, IEEE Transactions on Image Processing.

[29]  Luc Van Gool,et al.  Random Forests for Real Time 3D Face Analysis , 2012, International Journal of Computer Vision.

[30]  Isaac Amidror,et al.  Scattered data interpolation methods for electronic imaging systems: a survey , 2002, J. Electronic Imaging.

[31]  Mohan M. Trivedi,et al.  Continuous Head Movement Estimator for Driver Assistance: Issues, Algorithms, and On-Road Evaluations , 2014, IEEE Transactions on Intelligent Transportation Systems.

[32]  Davis E. King,et al.  Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..

[33]  Jean-Marc Odobez,et al.  HeadFusion: 360° Head Pose Tracking Combining 3D Morphable Model and 3D Reconstruction , 2018, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  In-So Kweon,et al.  Real-Time Head Orientation from a Monocular Camera Using Deep Neural Network , 2014, ACCV.

[35]  Peter Robinson,et al.  3D Constrained Local Model for rigid and non-rigid facial tracking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Didier Stricker,et al.  Fusion of Keypoint Tracking and Facial Landmark Detection for Real-Time Head Pose Estimation , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[37]  Lin Liang,et al.  AAM based face tracking with temporal matching and face segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[38]  Javier R. Movellan,et al.  Monocular head pose estimation using generalized adaptive view-based appearance model , 2010, Image Vis. Comput..

[39]  Sami Romdhani,et al.  A 3D Face Model for Pose and Illumination Invariant Face Recognition , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[40]  Alexandrina Rogozan,et al.  Driver head pose estimation using efficient descriptor fusion , 2016, EURASIP J. Image Video Process..

[41]  Rita Cucchiara,et al.  From Depth Data to Head Pose Estimation: A Siamese Approach , 2017, VISIGRAPP.

[42]  Jean-Marc Odobez,et al.  Combining dynamic head pose-gaze mapping with the robot conversational state for attention recognition in human-robot interactions , 2015, Pattern Recognit. Lett..

[43]  Janusz Konrad,et al.  Estimating head pose orientation using extremely low resolution images , 2016, 2016 IEEE Southwest Symposium on Image Analysis and Interpretation (SSIAI).

[44]  Hujun Bao,et al.  Efficient keyframe-based real-time camera tracking , 2014, Comput. Vis. Image Underst..