Deep Transfer Feature Based Convolutional Neural Forests for Head Pose Estimation

In real-world applications, factors such as illumination, occlusion, and poor image quality, etc. make robust head pose estimation much more challenging. In this paper, a novel deep transfer feature based on convolutional neural forest method (D-CNF) is proposed for head pose estimation. Deep transfer features are extracted from facial patches by a transfer network model, firstly. Then, a D-CNF is devised to integrate random trees with the representation learning from deep convolutional neural networks for robust head pose estimation. In the learning process, we introduce a neurally connected split function (NCSF) as the node splitting strategy in a convolutional neural tree. Experiments were conducted using public Pointing’04, BU3D-HP and CCNU-HP facial datasets. Compared to the state-of-the-art methods, the proposed method achieved much improved performance and great robustness with an average accuracy of 98.99% on BU3D-HP dataset, 95.7% on Pointing’04 and 82.46% on CCNU-HP dataset. In addition, in contrast to deep neural networks which require large-scale training data, our method performs well even when there are only a small amount of training data.

[1]  Peter Kontschieder,et al.  Neural Decision Forests for Semantic Image Labelling , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Angelo Cangelosi,et al.  Head pose estimation in the wild using Convolutional Neural Networks and adaptive gradient methods , 2017, Pattern Recognit..

[3]  In-So Kweon,et al.  Real-Time Head Orientation from a Monocular Camera Using Deep Neural Network , 2014, ACCV.

[4]  Sang-Heon Lee,et al.  Kernel locality-constrained sparse coding for head pose estimation , 2016, IET Comput. Vis..

[5]  Rama Chellappa,et al.  HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[7]  Wenming Zheng,et al.  Multi-View Facial Expression Recognition Based on Group Sparse Reduced-Rank Regression , 2014, IEEE Transactions on Affective Computing.

[8]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[9]  Ioannis A. Kakadiaris,et al.  Joint Head Pose Estimation and Face Alignment Framework Using Global and Local CNN Features , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[10]  Peter Kontschieder,et al.  Deep Neural Decision Forests , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[12]  Kun Zhang,et al.  Robust head pose estimation using Dirichlet-tree distribution enhanced random forests , 2016, Neurocomputing.

[13]  Mohan M. Trivedi,et al.  Head Pose Estimation in Computer Vision: A Survey , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Jun Wang,et al.  A 3D facial expression database for facial behavior research , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[15]  Luc Van Gool,et al.  Real-time facial feature detection using conditional regression forests , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Wei Liang,et al.  3D head pose estimation with convolutional neural network trained on synthetic images , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[17]  Shiguang Shan,et al.  Funnel-Structured Cascade for Multi-View Face Detection with Alignment-Awareness , 2016, Neurocomputing.

[18]  Rainer Stiefelhagen,et al.  HeHOP: Highly efficient head orientation and position estimation , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[19]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[20]  Neil Martin Robertson,et al.  Deep Head Pose: Gaze-Direction Estimation in Multimodal Video , 2015, IEEE Transactions on Multimedia.

[21]  Shiguang Shan,et al.  CovGa: A novel descriptor based on symmetry of regions for head pose estimation , 2014, Neurocomputing.

[22]  Xiaohui Yuan,et al.  Multi-level structured hybrid forest for joint head detection and pose estimation , 2017, Neurocomputing.

[23]  Tinne Tuytelaars,et al.  Fast Head Pose Estimation for Human-Computer Interaction , 2015, IbPRIA.

[24]  Luc Van Gool,et al.  Hough Forest-Based Facial Expression Recognition from Video Sequences , 2010, ECCV Workshops.

[25]  Bernt Schiele,et al.  DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model , 2016, ECCV.

[26]  Mohan M. Trivedi,et al.  A two-stage head pose estimation framework and evaluation , 2008, Pattern Recognit..

[27]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[28]  Xin Geng,et al.  Head Pose Estimation Based on Multivariate Label Distribution , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[30]  Shaogang Gong,et al.  Head Pose Classification in Crowded Scenes , 2009, BMVC.

[31]  Xiaogang Wang,et al.  Structured Feature Learning for Pose Estimation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Tong Zhang,et al.  A Deep Neural Network-Driven Feature Learning Method for Multi-view Facial Expression Recognition , 2016, IEEE Transactions on Multimedia.