Mixture of Deep Regression Networks for Head Pose Estimation

Accurate and robust head pose estimation is a challenging computer vision task. In most existing methods, single-modal RGB or depth images are directly used for head pose estimation. The obvious drawbacks of these methods are two fold: (1) Traditional shallow models are not good at learning representative features. (2) They are single-modal approaches, resulting in sensitivity to noise. As such, in this work we propose a novel multi-modal regression model for head pose estimation, named mixture of deep regression networks (MoDRN). It only uses good examples for one modality to learn sub-network parameters. Thus, the sub-networks tend to be better trained and more robust to noise, making significant improved performance in their combination. Experiments on public datasets such as BIWI and BU-3DFE show the effectiveness of our approach.

[1]  Luc Van Gool,et al.  Real time head pose estimation with random regression forests , 2011, CVPR 2011.

[2]  D. Basak,et al.  Support Vector Regression , 2008 .

[3]  Donghoon Lee,et al.  Fast and Accurate Head Pose Estimation via Random Projection Forests , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Doina Precup,et al.  Multi-layer temporal graphical model for head pose estimation in real-world videos , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[5]  David Beymer,et al.  Face recognition under varying pose , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Larry S. Davis,et al.  On partial least squares in head pose estimation: How to simultaneously deal with misalignment , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Fernando De la Torre,et al.  Robust Regression , 2016, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[9]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[10]  Luc Van Gool,et al.  Random Forests for Real Time 3D Face Analysis , 2012, International Journal of Computer Vision.

[11]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[12]  Andrew Blake,et al.  Sparse and Semi-supervised Visual Mapping with the S^3GP , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Shaogang Gong,et al.  Face distributions in similarity space under varying head pose , 2001, Image Vis. Comput..

[14]  Horst Bischof,et al.  Hough Networks for Head Pose Estimation and Facial Feature Localization , 2014, BMVC.