Dense 3D Regression for Hand Pose Estimation

We present a simple and effective method for 3D hand pose estimation from a single depth frame. As opposed to previous state-of-the-art methods based on holistic 3D regression, our method works on dense pixel-wise estimation. This is achieved by careful design choices in pose parameterization, which leverages both 2D and 3D properties of depth map. Specifically, we decompose the pose parameters into a set of per-pixel estimations, i.e., 2D heat maps, 3D heat maps and unit 3D directional vector fields. The 2D/3D joint heat maps and 3D joint offsets are estimated via multitask network cascades, which is trained end-to-end. The pixel-wise estimations can be directly translated into a vote casting scheme. A variant of mean shift is then used to aggregate local votes while enforcing consensus between the the estimated 3D pose and the pixel-wise 2D and 3D estimations by design. Our method is efficient and highly accurate. On MSRA and NYU hand dataset, our method outperforms all previous state-of-the-art approaches by a large margin. On the ICVL hand dataset, our method achieves similar accuracy compared to the nearly saturated result obtained by [5] and outperforms various other proposed methods. Code is available online1.

[1]  Ken Perlin,et al.  Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks , 2014, ACM Trans. Graph..

[2]  Fei Qiao,et al.  Region ensemble network: Improving convolutional network for hand pose estimation , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[3]  Tae-Kyun Kim,et al.  Latent Regression Forest: Structured Estimation of 3D Articulated Hand Posture , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Karthik Ramani,et al.  Learning Hand Articulations by Hallucinating Heat Distribution , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[7]  Yichen Wei,et al.  Model-Based Deep Hand Pose Estimation , 2016, IJCAI.

[8]  Cordelia Schmid,et al.  LCR-Net: Localization-Classification-Regression for Human Pose , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Deva Ramanan,et al.  3D Human Pose Estimation = 2D Pose Estimation + Matching , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  James J. Little,et al.  A Simple Yet Effective Baseline for 3d Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Luc Van Gool,et al.  Learning Probabilistic Non-Linear Latent Variable Models for Tracking Complex Activities , 2011, NIPS.

[12]  Seyed-Ahmad Ahmadi,et al.  Hough-CNN: Deep learning for segmentation of deep brain regions in MRI and ultrasound , 2016, Comput. Vis. Image Underst..

[13]  Antonis A. Argyros,et al.  Efficient model-based 3D tracking of hand articulations using Kinect , 2011, BMVC.

[14]  Qi Ye,et al.  Spatial Attention Deep Net with Partial PSO for Hierarchical Hybrid Hand Pose Estimation , 2016, ECCV.

[15]  Luc Van Gool,et al.  Hand Pose Estimation from Local Surface Normals , 2016, ECCV.

[16]  Jian Sun,et al.  Instance-Aware Semantic Segmentation via Multi-task Network Cascades , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Luc Van Gool,et al.  Crossing Nets: Combining GANs and VAEs with a Shared Latent Space for Hand Pose Estimation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[19]  Yichen Wei,et al.  Compositional Human Pose Regression , 2018, Comput. Vis. Image Underst..

[20]  Qi Ye,et al.  BigHand2.2M Benchmark: Hand Pose Dataset and State of the Art Analysis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Daniel Thalmann,et al.  Robust 3D Hand Pose Estimation in Single Depth Images: From Single-View CNN to Multi-View CNNs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Lourdes Agapito,et al.  Lifting from the Deep: Convolutional 3D Pose Estimation from a Single Image , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[24]  Cristian Sminchisescu,et al.  Deep Multitask Architecture for Integrated 2D and 3D Human Sensing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  David J. Fleet,et al.  Stochastic Tracking of 3D Human Figures Using 2D Image Motion , 2000, ECCV.

[26]  Francesc Moreno-Noguer,et al.  3D Human Pose Estimation from a Single Image via Distance Matrix Regression , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Guijin Wang,et al.  Pose Guided Structured Region Ensemble Network for Cascaded Hand Pose Estimation , 2017, Neurocomputing.

[28]  Daniel Thalmann,et al.  3D Convolutional Neural Networks for Efficient and Robust Hand Pose Estimation from Single Depth Images , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Yichen Wei,et al.  Towards 3D Human Pose Estimation in the Wild: A Weakly-Supervised Approach , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[31]  Xiaowei Zhou,et al.  Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Yu Zhang,et al.  Lie-X: Depth Image Based Articulated Object Pose Estimation, Tracking, and Action Recognition on Lie Groups , 2016, International Journal of Computer Vision.

[33]  Tae-Kyun Kim,et al.  Opening the Black Box: Hierarchical Sampling Optimization for Estimating Human Hand Pose , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[34]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Jonathan Tompson,et al.  Towards Accurate Multi-person Pose Estimation in the Wild , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Shimon Ullman,et al.  Human Pose Estimation Using Deep Consensus Voting , 2016, ECCV.

[37]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[38]  Hui Cheng,et al.  Recurrent 3D Pose Sequence Machines , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Vincent Lepetit,et al.  Hands Deep in Deep Learning for Hand Pose Estimation , 2015, ArXiv.

[40]  Vincent Lepetit,et al.  DeepPrior++: Improving Fast and Accurate 3D Hand Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[41]  Andrew W. Fitzgibbon,et al.  The Vitruvian manifold: Inferring dense correspondences for one-shot human pose estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[43]  Vincent Lepetit,et al.  Training a Feedback Loop for Hand Pose Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[44]  Thomas Wolf,et al.  How to Refine 3D Hand Pose Estimation from Unlabelled Depth Data? , 2017, 2017 International Conference on 3D Vision (3DV).

[45]  Yi Yang,et al.  Depth-Based Hand Pose Estimation: Data, Methods, and Challenges , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[46]  Sergey Ioffe,et al.  Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models , 2017, NIPS.

[47]  Hai Su,et al.  Deep Voting: A Robust Approach Toward Nucleus Localization in Microscopy Images , 2015, MICCAI.

[48]  Jian Sun,et al.  Cascaded hand pose regression , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Karthik Ramani,et al.  A Collaborative Filtering Approach to Real-Time Hand Pose Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[50]  Andrew Blake,et al.  Efficient Human Pose Estimation from Single Depth Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Chen Qian,et al.  Realtime and Robust Hand Tracking from Depth , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[53]  Kyoung Mu Lee,et al.  Holistic Planimetric prediction to Local Volumetric prediction for 3D Human Pose Estimation , 2017, ArXiv.

[54]  Sergio Escalera,et al.  Occlusion Aware Hand Pose Recovery from Sequences of Depth Images , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[55]  Peter V. Gehler,et al.  Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.