论文信息 - Deep Transfer Feature Based Convolutional Neural Forests for Head Pose Estimation

Deep Transfer Feature Based Convolutional Neural Forests for Head Pose Estimation

In real-world applications, factors such as illumination, occlusion, and poor image quality, etc. make robust head pose estimation much more challenging. In this paper, a novel deep transfer feature based on convolutional neural forest method (D-CNF) is proposed for head pose estimation. Deep transfer features are extracted from facial patches by a transfer network model, firstly. Then, a D-CNF is devised to integrate random trees with the representation learning from deep convolutional neural networks for robust head pose estimation. In the learning process, we introduce a neurally connected split function (NCSF) as the node splitting strategy in a convolutional neural tree. Experiments were conducted using public Pointing’04, BU3D-HP and CCNU-HP facial datasets. Compared to the state-of-the-art methods, the proposed method achieved much improved performance and great robustness with an average accuracy of 98.99% on BU3D-HP dataset, 95.7% on Pointing’04 and 82.46% on CCNU-HP dataset. In addition, in contrast to deep neural networks which require large-scale training data, our method performs well even when there are only a small amount of training data.

Yuanyuan Liu | Zhong Xie | Fang Fang | Xi Gong

[1] Peter Kontschieder,et al. Neural Decision Forests for Semantic Image Labelling , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2] Angelo Cangelosi,et al. Head pose estimation in the wild using Convolutional Neural Networks and adaptive gradient methods , 2017, Pattern Recognit..

[3] In-So Kweon,et al. Real-Time Head Orientation from a Monocular Camera Using Deep Neural Network , 2014, ACCV.

[4] Sang-Heon Lee,et al. Kernel locality-constrained sparse coding for head pose estimation , 2016, IET Comput. Vis..

[5] Rama Chellappa,et al. HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6] Andrew Zisserman,et al. Deep Face Recognition , 2015, BMVC.

[7] Wenming Zheng,et al. Multi-View Facial Expression Recognition Based on Group Sparse Reduced-Rank Regression , 2014, IEEE Transactions on Affective Computing.

[8] Ali Farhadi,et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[9] Ioannis A. Kakadiaris,et al. Joint Head Pose Estimation and Face Alignment Framework Using Global and Local CNN Features , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[10] Peter Kontschieder,et al. Deep Neural Decision Forests , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11] Trevor Darrell,et al. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[12] Kun Zhang,et al. Robust head pose estimation using Dirichlet-tree distribution enhanced random forests , 2016, Neurocomputing.

[13] Mohan M. Trivedi,et al. Head Pose Estimation in Computer Vision: A Survey , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14] Jun Wang,et al. A 3D facial expression database for facial behavior research , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[15] Luc Van Gool,et al. Real-time facial feature detection using conditional regression forests , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16] Wei Liang,et al. 3D head pose estimation with convolutional neural network trained on synthetic images , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[17] Shiguang Shan,et al. Funnel-Structured Cascade for Multi-View Face Detection with Alignment-Awareness , 2016, Neurocomputing.

[18] Rainer Stiefelhagen,et al. HeHOP: Highly efficient head orientation and position estimation , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[19] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[20] Neil Martin Robertson,et al. Deep Head Pose: Gaze-Direction Estimation in Multimodal Video , 2015, IEEE Transactions on Multimedia.

[21] Shiguang Shan,et al. CovGa: A novel descriptor based on symmetry of regions for head pose estimation , 2014, Neurocomputing.

[22] Xiaohui Yuan,et al. Multi-level structured hybrid forest for joint head detection and pose estimation , 2017, Neurocomputing.

[23] Tinne Tuytelaars,et al. Fast Head Pose Estimation for Human-Computer Interaction , 2015, IbPRIA.

[24] Luc Van Gool,et al. Hough Forest-Based Facial Expression Recognition from Video Sequences , 2010, ECCV Workshops.

[25] Bernt Schiele,et al. DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model , 2016, ECCV.

[26] Mohan M. Trivedi,et al. A two-stage head pose estimation framework and evaluation , 2008, Pattern Recognit..

[27] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[28] Xin Geng,et al. Head Pose Estimation Based on Multivariate Label Distribution , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29] Leo Breiman,et al. Random Forests , 2001, Machine Learning.

[30] Shaogang Gong,et al. Head Pose Classification in Crowded Scenes , 2009, BMVC.

[31] Xiaogang Wang,et al. Structured Feature Learning for Pose Estimation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Tong Zhang,et al. A Deep Neural Network-Driven Feature Learning Method for Multi-view Facial Expression Recognition , 2016, IEEE Transactions on Multimedia.