论文信息 - Training a Feedback Loop for Hand Pose Estimation

Training a Feedback Loop for Hand Pose Estimation

We propose an entirely data-driven approach to estimating the 3D pose of a hand given a depth image. We show that we can correct the mistakes made by a Convolutional Neural Network trained to predict an estimate of the 3D pose by using a feedback loop. The components of this feedback loop are also Deep Networks, optimized using training data. They remove the need for fitting a 3D model to the input data, which requires both a carefully designed fitting function and algorithm. We show that our approach outperforms state-of-the-art methods, and is efficient as our implementation runs at over 400 fps on a single GPU.

[1] Jorge Nocedal,et al. A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[2] Peter Dayan,et al. Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems , 2001 .

[3] Pascal Fua,et al. Articulated Soft Objects for Multiview Shape and Motion Capture , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[4] Yann LeCun,et al. Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.

[6] Mircea Nicolescu,et al. Vision-based hand pose estimation: A review , 2007, Comput. Vis. Image Underst..

[7] W. Usrey,et al. Emerging views of corticothalamic function , 2008, Current Opinion in Neurobiology.

[8] Geoffrey E. Hinton,et al. Analysis-by-Synthesis by Learning to Invert Generative Black Boxes , 2008, ICANN.

[9] Sven Behnke,et al. Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition , 2010, ICANN.

[10] ParagiosNikos,et al. Model-Based 3D Hand Pose Estimation from Monocular Video , 2011 .

[11] Antonis A. Argyros,et al. Efficient model-based 3D tracking of hand articulations using Kinect , 2011, BMVC.

[12] Graham W. Taylor,et al. Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.

[13] Antonis A. Argyros,et al. Full DOF tracking of a hand interacting with an object by modeling occlusions and physical constraints , 2011, 2011 International Conference on Computer Vision.

[14] Lale Akarun,et al. Real time hand pose estimation using depth sensors , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[15] Lale Akarun,et al. Hand Pose Estimation and Hand Shape Classification Using Multi-layered Randomized Decision Forests , 2012, ECCV.

[16] Andrew W. Fitzgibbon,et al. The Vitruvian manifold: Inferring dense correspondences for one-shot human pose estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17] Luc Van Gool,et al. Motion Capture of Hands in Action Using Discriminative Salient Points , 2012, ECCV.

[18] Tae-Kyun Kim,et al. Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests , 2013, 2013 IEEE International Conference on Computer Vision.

[19] Li Cheng,et al. Efficient Hand Pose Estimation from a Single Depth Image , 2013, 2013 IEEE International Conference on Computer Vision.

[20] Bodo Rosenhahn,et al. Real-Time Sign Language Recognition Using a Consumer Depth Camera , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[21] Antti Oulasvirta,et al. Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data , 2013, 2013 IEEE International Conference on Computer Vision.

[22] Sterling Orsten,et al. Dynamics based 3D skeletal hand tracking , 2013, I3D '13.

[23] Nitish Srivastava,et al. Learning Generative Models with Visual Attention , 2013, NIPS.

[24] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[25] Chen Qian,et al. Realtime and Robust Hand Tracking from Depth , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26] Jonathan Tompson,et al. Learning Human Pose Estimation Features with Convolutional Networks , 2013, ICLR.

[27] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[28] Ken Perlin,et al. Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks , 2014, ACM Trans. Graph..

[29] Tejas D. Kulkarni,et al. Deep Generative Vision as Approximate Bayesian Computation , 2014 .

[30] Dimitrios Tzionas,et al. Capturing Hand Motion with an RGB-D Sensor, Fusing a Generative Model with Salient Points , 2014, GCPR.

[31] Tae-Kyun Kim,et al. Latent Regression Forest: Structured Estimation of 3D Articulated Hand Posture , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[32] Andrew W. Fitzgibbon,et al. Accurate, Robust, and Flexible Real-time Hand Tracking , 2015, CHI.

[33] Joshua B. Tenenbaum,et al. Deep Convolutional Inverse Graphics Network , 2015, NIPS.

[34] Changsheng Xu,et al. Matching-CNN meets KNN: Quasi-parametric human parsing , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Thomas Brox,et al. Learning to generate chairs with convolutional neural networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Vincent Lepetit,et al. Hands Deep in Deep Learning for Hand Pose Estimation , 2015, ArXiv.