Estimating 2D Multi-hand Poses from Single Depth Images

We present a novel framework based on Pictorial Structure (PS) models to estimate 2D multi-hand poses from depth images. Most existing single-hand pose estimation algorithms are either subject to strong assumptions or depend on a weak detector to detect the human hand. We utilize Mask R-CNN to avoid both aforementioned constraints. The proposed framework allows detection of multi-hand instances and localization of hand joints simultaneously. Our experiments show that our method is superior to existing methods.

[1]  Thomas Brox,et al.  Learning to Estimate 3D Hand Pose from Single RGB Images , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[2]  Ken Perlin,et al.  Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks , 2014, ACM Trans. Graph..

[3]  Antti Oulasvirta,et al.  Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data , 2013, 2013 IEEE International Conference on Computer Vision.

[4]  Vittorio Ferrari,et al.  We Are Family: Joint Pose Estimation of Multiple Persons , 2010, ECCV.

[5]  Xiaogang Wang,et al.  Structured Feature Learning for Pose Estimation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Tae-Kyun Kim,et al.  Latent Regression Forest: Structured Estimation of 3D Articulated Hand Posture , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Ronan Collobert,et al.  Learning to Segment Object Candidates , 2015, NIPS.

[8]  Yi Li,et al.  Fully Convolutional Instance-Aware Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Bernt Schiele,et al.  DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model , 2016, ECCV.

[10]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[13]  Jian Sun,et al.  Instance-Aware Semantic Segmentation via Multi-task Network Cascades , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Daniel Thalmann,et al.  Robust 3D Hand Pose Estimation in Single Depth Images: From Single-View CNN to Multi-View CNNs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Gang Yu,et al.  Cascaded Pyramid Network for Multi-person Pose Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Sergio Escalera,et al.  Depth-Based 3D Hand Pose Estimation: From Current Achievements to Future Goals , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Peter V. Gehler,et al.  DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Andrew Zisserman,et al.  Human Pose Estimation Using a Joint Pixel-wise and Part-wise Formulation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Vincent Lepetit,et al.  Hands Deep in Deep Learning for Hand Pose Estimation , 2015, ArXiv.

[21]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[22]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[23]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[25]  Daniel Thalmann,et al.  3D Convolutional Neural Networks for Efficient and Robust Hand Pose Estimation from Single Depth Images , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Bernt Schiele,et al.  Pictorial structures revisited: People detection and articulated pose estimation , 2009, CVPR.