Leveraging Ordinal Regression With Soft Labels For 3d Head Pose Estimation From Point Sets

Head pose estimation from depth image is a challenging problem, considering its large pose variations, severer occlusions, and low quality of depth data. In contrast to existing approaches that take 2D depth image as input, we propose a novel deep regression architecture called Head PointNet, which consumes 3D point sets derived from a depth image describing the visible surface of a head. To cope with the non-stationary property of pose variation process, the network is facilitated with an ordinal regression module that incorporates metric penalties into ground truth label representation. The soft label representation encodes inter-class and intra-class information contained in the class labels simultaneously, and guides the network to learn discriminative features. Experiments on two challenging datasets, namely the Biwi Head Pose Dataset and Pandora Dataset, show that our proposed method outperforms state-of-the-art approaches.

[1]  Jan Kautz,et al.  Robust Model-Based 3D Head Pose Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Rafael Muñoz-Salinas,et al.  Deep Mixture of Linear Inverse Regressions Applied to Head-Pose Estimation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Vladimir Pavlovic,et al.  Visibility Constrained Generative Model for Depth-Based 3D Facial Pose Tracking , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Luc Van Gool,et al.  Random Forests for Real Time 3D Face Analysis , 2012, International Journal of Computer Vision.

[5]  Mohammed Bennamoun,et al.  3D Object Recognition in Cluttered Scenes with Local Surface Features: A Survey , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Sheng Wan,et al.  QuatNet: Quaternion-Based Head Pose Estimation With Multiregression Loss , 2019, IEEE Transactions on Multimedia.

[7]  Amit Marathe,et al.  Soft Labels for Ordinal Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Yung-Yu Chuang,et al.  FSA-Net: Learning Fine-Grained Structure Aggregation for Head Pose Estimation From a Single Image , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[11]  Antonis A. Argyros,et al.  Head pose estimation on depth data based on Particle Swarm Optimization , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[12]  Jun Yu,et al.  Real-Time Head Pose Estimation and Face Modeling From a Depth Image , 2019, IEEE Transactions on Multimedia.

[13]  Simone Calderara,et al.  Face-from-Depth for Head Pose Estimation on Depth Images , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Rita Cucchiara,et al.  POSEidon: Face-from-Depth for Driver Pose Estimation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Luc Van Gool,et al.  Real time head pose estimation with random regression forests , 2011, CVPR 2011.

[16]  Mark Pauly,et al.  Realtime performance-based facial animation , 2011, ACM Trans. Graph..

[17]  Neil Martin Robertson,et al.  Deep Head Pose: Gaze-Direction Estimation in Multimodal Video , 2015, IEEE Transactions on Multimedia.