A2J: Anchor-to-Joint Regression Network for 3D Articulated Pose Estimation From a Single Depth Image

For 3D hand and body pose estimation task in depth image, a novel anchor-based approach termed Anchor-to-Joint regression network (A2J) with the end-to-end learning ability is proposed. Within A2J, anchor points able to capture global-local spatial context information are densely set on depth image as local regressors for the joints. They contribute to predict the positions of the joints in ensemble way to enhance generalization ability. The proposed 3D articulated pose estimation paradigm is different from the state-of-the-art encoder-decoder based FCN, 3D CNN and point-set based manners. To discover informative anchor points towards certain joint, anchor proposal procedure is also proposed for A2J. Meanwhile 2D CNN (i.e., ResNet- 50) is used as backbone network to drive A2J, without using time-consuming 3D convolutional or deconvolutional layers. The experiments on 3 hand datasets and 2 body datasets verify A2J’s superiority. Meanwhile, A2J is of high running speed around 100 FPS on single NVIDIA 1080Ti GPU.

[1]  Qi Ye,et al.  BigHand2.2M Benchmark: Hand Pose Dataset and State of the Art Analysis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Jitendra Malik,et al.  Human Pose Estimation with Iterative Error Feedback , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Daniel Thalmann,et al.  Robust 3D Hand Pose Estimation in Single Depth Images: From Single-View CNN to Multi-View CNNs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Shanxin Yuan,et al.  The 2017 Hands in the Million Challenge on 3D Hand Pose Estimation , 2017, ArXiv.

[5]  Guijin Wang,et al.  Towards Good Practices for Deep 3D Hand Pose Estimation , 2017, ArXiv.

[6]  Kyoung Mu Lee,et al.  V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Wei Zhang,et al.  Convolutional Memory Blocks for Depth Data Representation Learning , 2018, IJCAI.

[8]  Alain Trémeau,et al.  Multi-task, multi-domain learning: Application to semantic segmentation and pose regression , 2017, Neurocomputing.

[9]  Tae-Kyun Kim,et al.  Latent Regression Forest: Structured Estimation of 3D Articulated Hand Posture , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Guijin Wang,et al.  Pose Guided Structured Region Ensemble Network for Cascaded Hand Pose Estimation , 2017, Neurocomputing.

[11]  Sergio Escalera,et al.  Depth-Based 3D Hand Pose Estimation: From Current Achievements to Future Goals , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Daniel Thalmann,et al.  3D Convolutional Neural Networks for Efficient and Robust Hand Pose Estimation from Single Depth Images , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Sergio Escalera,et al.  End-to-end Global to Local CNN Learning for Hand Pose Recovery in Depth data , 2017, ArXiv.

[14]  Junsong Yuan,et al.  Point-to-Point Regression PointNet for 3D Hand Pose Estimation , 2018, ECCV.

[15]  Ho Yub Jung,et al.  Random tree walk toward instantaneous 3D human pose estimation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[17]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[19]  Luc Van Gool,et al.  Dense 3D Regression for Hand Pose Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Ken Perlin,et al.  Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks , 2014, ACM Trans. Graph..

[21]  Kyoung Mu Lee,et al.  Holistic Planimetric prediction to Local Volumetric prediction for 3D Human Pose Estimation , 2017, ArXiv.

[22]  Qingmin Liao,et al.  Depth-images-based pose estimation using regression forests and graphical models , 2015, Neurocomputing.

[23]  Liang Chang,et al.  Hand3D: Hand Pose Estimation using 3D Neural Network , 2017, ArXiv.

[24]  Luc Van Gool,et al.  Crossing Nets: Combining GANs and VAEs with a Shared Latent Space for Hand Pose Estimation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Sebastian Nowozin,et al.  DISCO Nets : DISsimilarity COefficients Networks , 2016, NIPS.

[26]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Yichen Wei,et al.  Model-Based Deep Hand Pose Estimation , 2016, IJCAI.

[28]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Fei Qiao,et al.  Region ensemble network: Improving convolutional network for hand pose estimation , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[30]  Georgios Tzimiropoulos,et al.  Human Pose Estimation via Convolutional Part Heatmap Regression , 2016, ECCV.

[31]  Ruigang Yang,et al.  Accurate 3D pose estimation from a single depth image , 2011, 2011 International Conference on Computer Vision.

[32]  Fei-Fei Li,et al.  Towards Viewpoint Invariant 3D Human Pose Estimation , 2016, ECCV.

[33]  Jian Sun,et al.  Cascaded hand pose regression , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Junsong Yuan,et al.  Hand PointNet: 3D Hand Pose Estimation Using Point Sets , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Sebastian Scherer,et al.  VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[36]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[38]  Hui Cheng,et al.  Human Pose Estimation from Depth Images via Inference Embedded Multi-task Learning , 2016, ACM Multimedia.

[39]  Vincent Lepetit,et al.  Hands Deep in Deep Learning for Hand Pose Estimation , 2015, ArXiv.

[40]  Vincent Lepetit,et al.  DeepPrior++: Improving Fast and Accurate 3D Hand Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[41]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[44]  Sebastian Thrun,et al.  Real-Time Human Pose Tracking from Range Data , 2012, ECCV.

[45]  Cewu Lu,et al.  RMPE: Regional Multi-person Pose Estimation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[46]  Yichen Wei,et al.  Integral Human Pose Regression , 2017, ECCV.

[47]  Yu Zhang,et al.  Lie-X: Depth Image Based Articulated Object Pose Estimation, Tracking, and Action Recognition on Lie Groups , 2016, International Journal of Computer Vision.

[48]  Danica Kragic,et al.  Monocular real-time 3D articulated hand pose estimation , 2009, 2009 9th IEEE-RAS International Conference on Humanoid Robots.

[49]  Nadia Magnenat-Thalmann,et al.  Exploiting Spatial-Temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).