Deep Convolutional Neural Network-based Bernoulli Heatmap for Head Pose Estimation

Head pose estimation is a crucial problem for many tasks, such as driver attention, fatigue detection, and human behaviour analysis. It is well known that neural networks are better at handling classification problems than regression problems. It is an extremely nonlinear process to let the network output the angle value directly for optimization learning, and the weight constraint of the loss function will be relatively weak. This paper proposes a novel Bernoulli heatmap for head pose estimation from a single RGB image. Our method can achieve the positioning of the head area while estimating the angles of the head. The Bernoulli heatmap makes it possible to construct fully convolutional neural networks without fully connected layers and provides a new idea for the output form of head pose estimation. A deep convolutional neural network (CNN) structure with multiscale representations is adopted to maintain high-resolution information and low-resolution information in parallel. This kind of structure can maintain rich, high-resolution representations. In addition, channelwise fusion is adopted to make the fusion weights learnable instead of simple addition with equal weights. As a result, the estimation is spatially more precise and potentially more accurate. The effectiveness of the proposed method is empirically demonstrated by comparing it with other state-of-the-art methods on public datasets.

[1]  Raquel Urtasun,et al.  Understanding the Effective Receptive Field in Deep Convolutional Neural Networks , 2016, NIPS.

[2]  Georgios Tzimiropoulos,et al.  How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks) , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  Luc Van Gool,et al.  Random Forests for Real Time 3D Face Analysis , 2012, International Journal of Computer Vision.

[4]  Dong Liu,et al.  Deep High-Resolution Representation Learning for Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Yun Fu,et al.  Image Super-Resolution Using Very Deep Residual Channel Attention Networks , 2018, ECCV.

[6]  Neil Martin Robertson,et al.  Deep Head Pose: Gaze-Direction Estimation in Multimodal Video , 2015, IEEE Transactions on Multimedia.

[7]  Jean-Marc Odobez,et al.  HeadFusion: 360° Head Pose Tracking Combining 3D Morphable Model and 3D Reconstruction , 2018, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Sheng Wan,et al.  QuatNet: Quaternion-Based Head Pose Estimation With Multiregression Loss , 2019, IEEE Transactions on Multimedia.

[9]  Jörn Ostermann,et al.  Deep Head Pose Estimation Using Synthetic Images and Partial Adversarial Domain Adaption for Continuous Label Spaces , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Jean-Marc Odobez,et al.  Robust and Accurate 3D Head Pose Estimation through 3DMM and Online Head Model Reconstruction , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[11]  Jingying Chen,et al.  Head pose estimation with soft labels using regularized convolutional neural network , 2019, Neurocomputing.

[12]  Zhen He,et al.  Numerical Coordinate Regression with Convolutional Neural Networks , 2018, ArXiv.

[13]  James M. Rehg,et al.  Fine-Grained Head Pose Estimation Without Keypoints , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[14]  Rafael Muñoz-Salinas,et al.  Deep Mixture of Linear Inverse Regressions Applied to Head-Pose Estimation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Shaohua Li,et al.  Facial Pose Estimation by Deep Learning from Label Distributions , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[16]  Yung-Yu Chuang,et al.  FSA-Net: Learning Fine-Grained Structure Aggregation for Head Pose Estimation From a Single Image , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Nanning Zheng,et al.  Tri-Co Robot: a Chinese robotic research initiative for enhanced robot interaction capabilities , 2018 .

[18]  Leibo Liu,et al.  Face Alignment With Expression- and Pose-Based Adaptive Initialization , 2019, IEEE Transactions on Multimedia.

[19]  Xiangyu Zhu,et al.  Face Alignment in Full Pose Range: A 3D Total Solution , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Jan Kautz,et al.  Dynamic Facial Analysis: From Bayesian Filtering to Recurrent Neural Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Simone Calderara,et al.  Face-from-Depth for Head Pose Estimation on Depth Images , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Alessio Del Bue,et al.  MX-LSTM: Mixing Tracklets and Vislets to Jointly Forecast Trajectories and Head Poses , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Jun Yu,et al.  Real-Time Head Pose Estimation and Face Modeling From a Depth Image , 2019, IEEE Transactions on Multimedia.

[24]  Pi-Cheng Hsiu,et al.  SSR-Net: A Compact Soft Stagewise Regression Network for Age Estimation , 2018, IJCAI.

[25]  Rainer Stiefelhagen,et al.  Real Time Head Model Creation and Head Pose Estimation on Consumer Depth Cameras , 2014, 2014 2nd International Conference on 3D Vision.

[26]  Bin Huang,et al.  Improving head pose estimation using two-stage ensembles with top-k regression , 2020, Image Vis. Comput..

[27]  Junhui Hou,et al.  Single image-based head pose estimation with spherical parametrization and 3D morphing , 2020, Pattern Recognit..

[28]  Qiang Ji,et al.  Coupled cascade regression for simultaneous facial landmark detection and head pose estimation , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[29]  Josephine Sullivan,et al.  One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Didier Stricker,et al.  Fusion of Keypoint Tracking and Facial Landmark Detection for Real-Time Head Pose Estimation , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[31]  Wei Liang,et al.  A deep Coarse-to-Fine network for head pose estimation from synthetic data , 2019, Pattern Recognit..