论文信息 - Depth-Based Hand Pose Estimation: Methods, Data, and Challenges

Depth-Based Hand Pose Estimation: Methods, Data, and Challenges

Hand pose estimation has matured rapidly in recent years. The introduction of commodity depth sensors and a multitude of practical applications have spurred new advances. We provide an extensive analysis of the state-of-the-art, focusing on hand pose estimation from a single depth frame. To do so, we have implemented a considerable number of systems, and have released software and evaluation code. We summarize important conclusions here: (1) Coarse pose estimation appears viable for scenes with isolated hands. However, high precision pose estimation [required for immersive virtual reality and cluttered scenes (where hands may be interacting with nearby objects and surfaces) remain a challenge. To spur further progress we introduce a challenging new dataset with diverse, cluttered scenes. (2) Many methods evaluate themselves with disparate criteria, making comparisons difficult. We define a consistent evaluation criteria, rigorously motivated by human experiments. (3) We introduce a simple nearest-neighbor baseline that outperforms most existing systems. This implies that most systems do not generalize beyond their training sets. This also reinforces the under-appreciated point that training data is as important as the model itself. We conclude with directions for future progress.

[1] W. Stokoe,et al. Sign language structure: an outline of the visual communication systems of the American deaf. 1960. , 1961, Journal of deaf studies and deaf education.

[2] Anthony A. Maciejewski,et al. Computational modeling for the computer animation of legged figures , 1985, SIGGRAPH.

[3] Olivier D. Faugeras,et al. 3D Articulated Models and Multiview Tracking with Physical Forces , 2001, Comput. Vis. Image Underst..

[4] L. Wasserman,et al. Fast Algorithms and Efficient Statistics: N-Point Correlation Functions , 2000, astro-ph/0012333.

[5] Trevor Darrell,et al. Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[6] Vladimir Vezhnevets,et al. A Survey on Pixel-Based Skin Color Detection Techniques , 2003 .

[7] Jitendra Malik,et al. Learning to detect natural image boundaries using local brightness, color, and texture cues , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8] Nicol N. Schraudolph,et al. 3D hand tracking by rapid stochastic gradient descent using a skinning model , 2004 .

[9] Pietro Perona,et al. Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[10] King-Sun Fu,et al. IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11] Richard Szeliski,et al. A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[12] Björn Stenger,et al. Model-based hand tracking using a hierarchical Bayesian filter , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13] Luc Van Gool,et al. European conference on computer vision (ECCV) , 2006, eccv 2006.

[14] Ulrich Neumann,et al. Real-time Hand Pose Recognition Using Low-Resolution Depth Images , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[15] Mircea Nicolescu,et al. Vision-based hand pose estimation: A review , 2007, Comput. Vis. Image Underst..

[16] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[17] Danica Kragic,et al. Monocular real-time 3D articulated hand pose estimation , 2009, 2009 9th IEEE-RAS International Conference on Humanoid Robots.

[18] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[19] Dima Damen,et al. Recognizing linked events: Searching the space of feasible explanations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[20] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21] Quang Nguyen,et al. Human Computer Interaction Using Hand Gestures , 2014, ICIC.

[22] Alexei A. Efros,et al. Unbiased look at dataset bias , 2011, CVPR 2011.

[23] Andrew W. Fitzgibbon,et al. Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[24] Barbara Caputo,et al. Using Object Affordances to Improve Object Recognition , 2011, IEEE Transactions on Autonomous Mental Development.

[25] Jonathan T. Barron,et al. A category-level 3-D object dataset: Putting the Kinect to work , 2011, ICCV Workshops.

[26] James M. Rehg,et al. Learning to recognize objects in egocentric activities , 2011, CVPR 2011.

[27] Yi Yang,et al. Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[28] Junsong Yuan,et al. Robust hand gesture recognition based on finger-earth mover's distance with a commodity depth camera , 2011, ACM Multimedia.

[29] Antonis A. Argyros,et al. Efficient model-based 3D tracking of hand articulations using Kinect , 2011, BMVC.

[30] Pietro Perona,et al. Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31] Lale Akarun,et al. Hand Pose Estimation and Hand Shape Classification Using Multi-layered Randomized Decision Forests , 2012, ECCV.

[32] Charless C. Fowlkes,et al. Do We Need More Training Data or Better Models for Object Detection? , 2012, BMVC.

[33] Nicolas Pugeault,et al. Sign language recognition using sub-units , 2012, J. Mach. Learn. Res..

[34] Luis Salgado,et al. Efficient spatio-temporal hole filling strategy for Kinect depth maps , 2012, Electronic Imaging.

[35] Luc Van Gool,et al. Motion Capture of Hands in Action Using Discriminative Salient Points , 2012, ECCV.

[36] Camille Couprie,et al. Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37] Fei-Fei Li,et al. Detecting Avocados to Zucchinis: What Have We Done, and Where Are We Going? , 2013, 2013 IEEE International Conference on Computer Vision.

[38] Cheng Li,et al. Pixel-Level Hand Detection in Ego-centric Videos , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[39] Aaron M. Dollar,et al. Grasp Frequency and Usage in Daily Household and Machine Shop Tasks , 2013, IEEE Transactions on Haptics.

[40] Tae-Kyun Kim,et al. Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests , 2013, 2013 IEEE International Conference on Computer Vision.

[41] Haibin Ling,et al. Finding the Best from the Second Bests - Inhibiting Subjective Bias in Evaluation of Visual Tracking Algorithms , 2013, 2013 IEEE International Conference on Computer Vision.

[42] Li Cheng,et al. Efficient Hand Pose Estimation from a Single Depth Image , 2013, 2013 IEEE International Conference on Computer Vision.

[43] Yi Yang,et al. Articulated Human Detection with Flexible Mixtures of Parts , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44] Andrea Fossati,et al. Consumer Depth Cameras for Computer Vision , 2013, Advances in Computer Vision and Pattern Recognition.

[45] Antti Oulasvirta,et al. Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data , 2013, 2013 IEEE International Conference on Computer Vision.

[46] Sterling Orsten,et al. Dynamics based 3D skeletal hand tracking , 2013, I3D '13.

[47] Danica Kragic,et al. A Metric for Comparing the Anthropomorphic Motion Capability of Artificial Hands , 2013, IEEE Transactions on Robotics.

[48] Jitendra Malik,et al. Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.

[49] Chen Qian,et al. Realtime and Robust Hand Tracking from Depth , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[50] Deva Ramanan,et al. 3D Hand Pose Detection in Egocentric RGB-D Images , 2014, ECCV Workshops.

[51] Varun Ramakrishna,et al. User-Specific Hand Modeling from Monocular Depth Sequences , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[52] Jianxiong Xiao,et al. Sliding Shapes for 3D Object Detection in Depth Images , 2014, ECCV.

[53] Mohan M. Trivedi,et al. Hand Gesture Recognition in Real Time for Automotive Interfaces: A Multimodal Vision-Based Approach and Evaluations , 2014, IEEE Transactions on Intelligent Transportation Systems.

[54] Hedvig Kjellström,et al. Audio-visual classification and detection of human manipulation actions , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[55] David G. Lowe,et al. Scalable Nearest Neighbor Algorithms for High Dimensional Data , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56] Ken Perlin,et al. Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks , 2014, ACM Trans. Graph..

[57] Dimitrios Tzionas,et al. Capturing Hand Motion with an RGB-D Sensor, Fusing a Generative Model with Salient Points , 2014, GCPR.

[58] Tae-Kyun Kim,et al. Latent Regression Forest: Structured Estimation of 3D Articulated Hand Posture , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[59] Andrew W. Fitzgibbon,et al. Learning an efficient model of hand shape variation from depth images , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60] Antti Oulasvirta,et al. Fast and robust hand tracking using detection-guided optimization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61] Karthik Ramani,et al. A Collaborative Filtering Approach to Real-Time Hand Pose Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[62] Vincent Lepetit,et al. Training a Feedback Loop for Hand Pose Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[63] Andrew W. Fitzgibbon,et al. Accurate, Robust, and Flexible Real-time Hand Tracking , 2015, CHI.

[64] Haibin Ling,et al. 3D Hand Pose Estimation Using Randomized Decision Forest with Segmentation Index Points , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[65] Ron Kimmel,et al. Rule of thumb: Deep derotation for improved fingertip detection , 2015, BMVC.

[66] Jian Sun,et al. Cascaded hand pose regression , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67] Vincent Lepetit,et al. Hands Deep in Deep Learning for Hand Pose Estimation , 2015, ArXiv.

[68] Deva Ramanan,et al. First-person pose recognition using egocentric workspaces , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[69] Deva Ramanan,et al. Understanding Everyday Hands in Action from RGB-D Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[70] Tae-Kyun Kim,et al. Opening the Black Box: Hierarchical Sampling Optimization for Estimating Human Hand Pose , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[71] Luc Van Gool,et al. Hand Pose Estimation from Local Surface Normals , 2016, ECCV.

[72] Andrew W. Fitzgibbon,et al. Efficient and precise interactive hand tracking through joint, continuous optimization of pose and correspondences , 2016, ACM Trans. Graph..

[73] Vincent Lepetit,et al. Efficiently Creating 3D Training Data for Fine Hand Pose Estimation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[74] Qi Ye,et al. Spatial Attention Deep Net with Partial PSO for Hierarchical Hybrid Hand Pose Estimation , 2016, ECCV.