Recurrent 3D Hand Pose Estimation Using Cascaded Pose-Guided 3D Alignments

3D hand pose estimation is a challenging problem in computer vision due to the high degrees-of-freedom of hand articulated motion space and large viewpoint variation. As a consequence, similar poses observed from multiple views can be dramatically different. In order to deal with this issue, view-independent features are required to achieve state-of-the-art performance. In this paper, we investigate the impact of view-independent features on 3D hand pose estimation from a single depth image, and propose a novel recurrent neural network for 3D hand pose estimation, in which a cascaded 3D pose-guided alignment strategy is designed for view-independent feature extraction and a recurrent hand pose module is designed for modeling the dependencies among sequential aligned features for 3D hand pose estimation. In particular, our cascaded pose-guided 3D alignments are performed in 3D space in a coarse-to-fine fashion. First, hand joints are predicted and globally transformed into a canonical reference frame; Second, the palm of the hand is detected and aligned; Third, local transformations are applied to the fingers to refine the final predictions. The proposed recurrent hand pose module for aligned 3D representation can extract recurrent pose-aware features and iteratively refines the estimated hand pose. Our recurrent module could be utilized for both single-view estimation and sequence-based estimation with 3D hand pose tracking. Experiments show that our method improves the state-of-the-art by a large margin on popular benchmarks with the simple yet efficient alignment and network architectures.

[1]  Hyung Jin Chang,et al.  SeqHAND: RGB-Sequence-Based 3D Hand Pose and Shape Estimation , 2020, ECCV.

[2]  Howie Choset,et al.  One Framework to Register Them All: PointNet Encoding for Point Cloud Alignment , 2019, ArXiv.

[3]  Yang Xiao,et al.  A2J: Anchor-to-Joint Regression Network for 3D Articulated Pose Estimation From a Single Depth Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Yi Sun,et al.  CrossInfoNet: Multi-Task Information Sharing Based Hand Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Dongheui Lee,et al.  Point-To-Pose Voting Based Hand Pose Estimation Using Residual Permutation Equivariant Layer , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Gang Wang,et al.  Context-Aware Deep Spatiotemporal Network for Hand Pose Estimation From Depth Images , 2018, IEEE Transactions on Cybernetics.

[7]  Junsong Yuan,et al.  Point-to-Point Regression PointNet for 3D Hand Pose Estimation , 2018, ECCV.

[8]  Junsong Yuan,et al.  Hand PointNet: 3D Hand Pose Estimation Using Point Sets , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Yue Wang,et al.  Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[10]  Vincent Lepetit,et al.  Feature Mapping for Learning Fast and Accurate 3D Pose Inference from Synthetic Images , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Luc Van Gool,et al.  Dense 3D Regression for Hand Pose Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Kyoung Mu Lee,et al.  V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Andrew W. Fitzgibbon,et al.  Online generative model personalization for hand tracking , 2017, ACM Trans. Graph..

[14]  David Kim,et al.  Articulated distance fields for ultra-fast tracking of hands interacting , 2017, ACM Trans. Graph..

[15]  Vincent Lepetit,et al.  DeepPrior++: Improving Fast and Accurate 3D Hand Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[16]  Guijin Wang,et al.  Pose Guided Structured Region Ensemble Network for Cascaded Hand Pose Estimation , 2017, Neurocomputing.

[17]  Daniel Thalmann,et al.  3D Convolutional Neural Networks for Efficient and Robust Hand Pose Estimation from Single Depth Images , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Shanxin Yuan,et al.  The 2017 Hands in the Million Challenge on 3D Hand Pose Estimation , 2017, ArXiv.

[19]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[20]  Thomas Brox,et al.  Learning to Estimate 3D Hand Pose from Single RGB Images , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Liang Chang,et al.  Hand3D: Hand Pose Estimation using 3D Neural Network , 2017, ArXiv.

[22]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Andrea Tagliasacchi,et al.  Sphere-meshes for real-time hand modeling and tracking , 2016, ACM Trans. Graph..

[24]  Andrew W. Fitzgibbon,et al.  Efficient and precise interactive hand tracking through joint, continuous optimization of pose and correspondences , 2016, ACM Trans. Graph..

[25]  Daniel Thalmann,et al.  Robust 3D Hand Pose Estimation in Single Depth Images: From Single-View CNN to Multi-View CNNs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Yichen Wei,et al.  Model-Based Deep Hand Pose Estimation , 2016, IJCAI.

[27]  Andrew W. Fitzgibbon,et al.  Fits Like a Glove: Rapid and Reliable Hand Shape Personalization , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Vincent Lepetit,et al.  Efficiently Creating 3D Training Data for Fine Hand Pose Estimation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Qi Ye,et al.  Spatial Attention Deep Net with Partial PSO for Hierarchical Hybrid Hand Pose Estimation , 2016, ECCV.

[30]  Fei-Fei Li,et al.  Towards Viewpoint Invariant 3D Human Pose Estimation , 2016, ECCV.

[31]  Vincent Lepetit,et al.  Training a Feedback Loop for Hand Pose Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[32]  Jitendra Malik,et al.  Human Pose Estimation with Iterative Error Feedback , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Jian Sun,et al.  Cascaded hand pose regression , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Andrew W. Fitzgibbon,et al.  Learning an efficient model of hand shape variation from depth images , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Yi Yang,et al.  Depth-Based Hand Pose Estimation: Methods, Data, and Challenges , 2015, International Journal of Computer Vision.

[36]  Andrew W. Fitzgibbon,et al.  Accurate, Robust, and Flexible Real-time Hand Tracking , 2015, CHI.

[37]  Vincent Lepetit,et al.  Hands Deep in Deep Learning for Hand Pose Estimation , 2015, ArXiv.

[38]  Ken Perlin,et al.  Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks , 2014, ACM Trans. Graph..

[39]  Tae-Kyun Kim,et al.  Latent Regression Forest: Structured Estimation of 3D Articulated Hand Posture , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Varun Ramakrishna,et al.  User-Specific Hand Modeling from Monocular Depth Sequences , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Chen Qian,et al.  Realtime and Robust Hand Tracking from Depth , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Tae-Kyun Kim,et al.  Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests , 2013, 2013 IEEE International Conference on Computer Vision.

[43]  Mircea Nicolescu,et al.  Vision-based hand pose estimation: A review , 2007, Comput. Vis. Image Underst..

[44]  Dimitrios Tzionas,et al.  Embodied Hands: Modeling and Capturing Hands and Bodies Together , 2022, ArXiv.

[45]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .