Simple and effective deep hand shape and pose regression from a single depth image

Abstract Simultaneously estimating the 3D shape and pose of a hand in real time is a new and challenging computer graphics problem, which is important for animation and interactions with 3D objects in virtual environments with personalized hand shapes. CNN-based direct hand pose estimation methods are the state-of-the-art approaches, but they can only regress a 3D hand pose from a single depth image. In this study, we developed a simple and effective real-time CNN-based direct regression approach for simultaneously estimating the 3D hand shape and pose, as well as structure constraints for both egocentric and third person viewpoints by learning from the synthetic depth. In addition, we produced the first million-scale egocentric synthetic dataset called SynHandEgo, which contains egocentric depth images with accurate shape and pose annotations, as well as color segmentation of the hand parts. Our network is trained based on combined real and synthetic datasets with full supervision of the hand pose and structure constraints, and semi-supervision of the hand mesh. Our approach performed better than the state-of-the-art methods based on the SynHand5M synthetic dataset in terms of both the 3D shape and pose recovery. By learning simultaneously using real and synthetic data, we demonstrated the feasibility of hand mesh recovery from two real hand pose datasets, i.e., BigHand2.2M and NYU. Moreover, our method obtained more accurate estimates of the 3D hand poses based on the NYU dataset compared with the existing methods that output more than joint positions. The SynHandEgo dataset has been made publicly available to promote further research in the emerging domain of hand shape and pose recovery from egocentric viewpoints ( https://bit.ly/2WMWM5u ).

[1]  Qi Ye,et al.  BigHand2.2M Benchmark: Hand Pose Dataset and State of the Art Analysis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Yichen Wei,et al.  Compositional Human Pose Regression , 2018, Comput. Vis. Image Underst..

[3]  Vincent Lepetit,et al.  Training a Feedback Loop for Hand Pose Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Didier Stricker,et al.  Simultaneous Hand Pose and Skeleton Bone-Lengths Estimation from a Single Depth Image , 2017, 2017 International Conference on 3D Vision (3DV).

[5]  Junsong Yuan,et al.  Point-to-Point Regression PointNet for 3D Hand Pose Estimation , 2018, ECCV.

[6]  Fei Qiao,et al.  Region ensemble network: Improving convolutional network for hand pose estimation , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[7]  Daniel Thalmann,et al.  3D Convolutional Neural Networks for Efficient and Robust Hand Pose Estimation from Single Depth Images , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Yu Zhang,et al.  Lie-X: Depth Image Based Articulated Object Pose Estimation, Tracking, and Action Recognition on Lie Groups , 2016, International Journal of Computer Vision.

[9]  Christian Theobalt,et al.  Real-Time Hand Tracking under Occlusion from an Egocentric RGB-D Sensor , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Guijin Wang,et al.  Towards Good Practices for Deep 3D Hand Pose Estimation , 2017, ArXiv.

[11]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[12]  Christian Theobalt,et al.  GANerated Hands for Real-Time 3D Hand Tracking from Monocular RGB , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Vincent Lepetit,et al.  Feature Mapping for Learning Fast and Accurate 3D Pose Inference from Synthetic Images , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Ken Perlin,et al.  Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks , 2014, ACM Trans. Graph..

[15]  Yichen Wei,et al.  Model-Based Deep Hand Pose Estimation , 2016, IJCAI.

[16]  Didier Stricker,et al.  3DAirSig: A Framework for Enabling In-Air Signatures Using a Multi-Modal Depth Sensor , 2018, Sensors.

[17]  Qi Ye,et al.  Spatial Attention Deep Net with Partial PSO for Hierarchical Hybrid Hand Pose Estimation , 2016, ECCV.

[18]  Luc Van Gool,et al.  Crossing Nets: Combining GANs and VAEs with a Shared Latent Space for Hand Pose Estimation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Kyoung Mu Lee,et al.  V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Tomas Pfister,et al.  Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Luc Van Gool,et al.  Dense 3D Regression for Hand Pose Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Jianfei Cai,et al.  3D Hand Shape and Pose Estimation from a Single RGB Image (Supplementary Material) , 2019 .

[23]  Didier Stricker,et al.  DeepHPS: End-to-end Estimation of 3D Hand Pose and Shape by Learning from Synthetic Depth , 2018, 2018 International Conference on 3D Vision (3DV).

[24]  Christian Wolf,et al.  Hand pose estimation through semi-supervised and weakly-supervised learning , 2015, Comput. Vis. Image Underst..

[25]  Didier Stricker,et al.  Structure-Aware 3D Hand Pose Regression from a Single Depth Image , 2018, EuroVR.

[26]  Tae-Kyun Kim,et al.  SHPR-Net: Deep Semantic Hand Pose Regression From Point Clouds , 2018, IEEE Access.

[27]  Vincent Lepetit,et al.  DeepPrior++: Improving Fast and Accurate 3D Hand Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).