Hierarchical neural network for hand pose estimation

Abstract Hand pose estimation plays an important role in human–computer interaction and augmented reality. Regressing the joints coordinates is a difficult task due to the flexibility of the joint, self-occlusion and so on. In this paper, we propose a novel and simple hierarchical neural network for hand pose estimation. The hand joint coordinates are divided into six parts and each part is regressed in sequence with this hierarchical architecture. This can divide the complex task of regressing all hand joints coordinates into several sub-tasks which can make the estimation more accurate. When regress the joint coordinates of one part, the features of other parts may bring negative influence to this part due to the similarity among the fingers, so we use an interference cancellation operation in our hierarchical architecture. At the time the joint coordinates of one part are regressed, the corresponding features will be removed from the hand global feature to eliminate the interference of this part. The obtained features will be used as input for regressing the joints coordinates of the next part. The ablation study verifies the effectiveness of our hierarchical architecture. The experimental results demonstrate that our method can achieve state-of-the-art or comparable results relative to existing methods on four public hand pose datasets.

[1]  Tae-Kyun Kim,et al.  Latent Regression Forest: Structured Estimation of 3D Hand Poses , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Vincent Lepetit,et al.  Hands Deep in Deep Learning for Hand Pose Estimation , 2015, ArXiv.

[3]  Tae-Kyun Kim,et al.  SHPR-Net: Deep Semantic Hand Pose Regression From Point Clouds , 2018, IEEE Access.

[4]  Shanxin Yuan,et al.  The 2017 Hands in the Million Challenge on 3D Hand Pose Estimation , 2017, ArXiv.

[5]  Dongheui Lee,et al.  Point-To-Pose Voting Based Hand Pose Estimation Using Residual Permutation Equivariant Layer , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Mircea Nicolescu,et al.  Vision-based hand pose estimation: A review , 2007, Comput. Vis. Image Underst..

[7]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[8]  Ken Perlin,et al.  Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks , 2014, ACM Trans. Graph..

[9]  Qiang Xu,et al.  Structure-Aware 3D Hourglass Network for Hand Pose Estimation from Single Depth Image , 2018, BMVC.

[10]  Lale Akarun,et al.  Hand Pose Estimation and Hand Shape Classification Using Multi-layered Randomized Decision Forests , 2012, ECCV.

[11]  Guijin Wang,et al.  Towards Good Practices for Deep 3D Hand Pose Estimation , 2017, ArXiv.

[12]  Guijin Wang,et al.  Pose Guided Structured Region Ensemble Network for Cascaded Hand Pose Estimation , 2017, Neurocomputing.

[13]  Sergio Escalera,et al.  End-to-end Global to Local CNN Learning for Hand Pose Recovery in Depth data , 2017, ArXiv.

[14]  Tong Lu,et al.  Hand Pose Estimation with Attention-and-Sequence Network , 2018, PCM.

[15]  Yichen Wei,et al.  Model-Based Deep Hand Pose Estimation , 2016, IJCAI.

[16]  Qi Ye,et al.  Spatial Attention Deep Net with Partial PSO for Hierarchical Hybrid Hand Pose Estimation , 2016, ECCV.

[17]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[18]  Shanxin Yuan,et al.  RGB-based 3D Hand Pose Estimation via Privileged Learning with Depth Images , 2018, ArXiv.