Multi-level structured hybrid forest for joint head detection and pose estimation

Abstract In real-world applications, factors such as illumination variation, occlusion, and poor image quality, etc. make head detection and pose estimation much more challenging. In this paper, we propose a multi-level structured hybrid forest (MSHF) for joint head detection and pose estimation. Our method extends the hybrid framework of classification and regression forests by introducing multi-level splitting functions and multi-structural features. Multi-level splitting functions are used to construct trees in different layers of MSHF. Multi-structured features are extracted from randomly selected image patches, which are either head region or the background. The head contour is derived from these patches using the signed distance of the patch center to the head contour by MSHF regression. The randomly selected sub-regions from the patches within the head contour are used to develop the MSHF for head pose estimation in a coarse-to-fine manner. The weighted neighbor structured aggregation integrates votes from trees to achieve an estimation of continuous pose angles. Experiments were conducted using public datasets and video streams. Compared to the state-of-the-art methods, MSHF achieved improved performance and great robustness with an average accuracy of 90% and the average angular error of 6.6°. The averaged time for performing a joint head detection and pose estimation is about 0.44 s.

[1]  Chi Fang,et al.  Head Pose Estimation Based on Random Forests for Multiclass Classification , 2010, 2010 20th International Conference on Pattern Recognition.

[2]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[3]  Mohan M. Trivedi,et al.  A two-stage head pose estimation framework and evaluation , 2008, Pattern Recognit..

[4]  Kun Zhang,et al.  Robust head pose estimation using Dirichlet-tree distribution enhanced random forests , 2016, Neurocomputing.

[5]  Shiguang Shan,et al.  CovGa: A novel descriptor based on symmetry of regions for head pose estimation , 2014, Neurocomputing.

[6]  Ahmed M. Elgammal,et al.  From circle to 3-sphere: Head pose estimation by instance parameterization , 2015, Comput. Vis. Image Underst..

[7]  James L. Crowley,et al.  Head Pose Estimation on Low Resolution Images , 2006, CLEAR.

[8]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[9]  Juergen Gall,et al.  Class-specific Hough forests for object detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Shaogang Gong,et al.  Head Pose Classification in Crowded Scenes , 2009, BMVC.

[11]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Tinne Tuytelaars,et al.  Fast Head Pose Estimation for Human-Computer Interaction , 2015, IbPRIA.

[13]  Peter Kontschieder,et al.  Structured class-labels in random forests for semantic image labelling , 2011, 2011 International Conference on Computer Vision.

[14]  Li-Jia Li,et al.  Multi-view Face Detection Using Deep Convolutional Neural Networks , 2015, ICMR.

[15]  Tae-Kyun Kim,et al.  Latent-Class Hough Forests for 3D Object Detection and Pose Estimation , 2014, ECCV.

[16]  Ben Glocker,et al.  Joint Classification-Regression Forests for Spatially Structured Multi-object Segmentation , 2012, ECCV.

[17]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Ke Li,et al.  Head Pose Estimation from Low-Resolution Image with Hough Forest , 2010, 2010 Chinese Conference on Pattern Recognition (CCPR).

[19]  Thomas Serre,et al.  A Component-based Framework for Face Detection and Identification , 2007, International Journal of Computer Vision.

[20]  C. Lawrence Zitnick,et al.  Structured Forests for Fast Edge Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[21]  Gang Hua,et al.  A convolutional neural network cascade for face detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Shiguang Shan,et al.  Funnel-Structured Cascade for Multi-View Face Detection with Alignment-Awareness , 2016, Neurocomputing.

[23]  Rabia Jafri,et al.  A Survey of Face Recognition Techniques , 2009, J. Inf. Process. Syst..

[24]  Xiaoou Tang,et al.  Facial Landmark Detection by Deep Multi-task Learning , 2014, ECCV.

[25]  Sang-Heon Lee,et al.  Kernel locality-constrained sparse coding for head pose estimation , 2016, IET Comput. Vis..

[26]  Xin Geng,et al.  Head Pose Estimation Based on Multivariate Label Distribution , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Yonggang Lu,et al.  A novel travel-time based similarity measure for hierarchical clustering , 2016, Neurocomputing.

[28]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[29]  Damon L. Woodard,et al.  Head pose estimation in the wild using approximate view manifolds , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[30]  Ahmed M. Elgammal,et al.  Regression from local features for viewpoint and pose estimation , 2011, 2011 International Conference on Computer Vision.

[31]  Mohan M. Trivedi,et al.  Head Pose Estimation in Computer Vision: A Survey , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.