Efficient Hand Pose Estimation from a Single Depth Image

We tackle the practical problem of hand pose estimation from a single noisy depth image. A dedicated three-step pipeline is proposed: Initial estimation step provides an initial estimation of the hand in-plane orientation and 3D location, Candidate generation step produces a set of 3D pose candidate from the Hough voting space with the help of the rotational invariant depth features, Verification step delivers the final 3D hand pose as the solution to an optimization problem. We analyze the depth noises, and suggest tips to minimize their negative impacts on the overall performance. Our approach is able to work with Kinect-type noisy depth images, and reliably produces pose estimations of general motions efficiently (12 frames per second). Extensive experiments are conducted to qualitatively and quantitatively evaluate the performance with respect to the state-of-the-art methods that have access to additional RGB images. Our approach is shown to deliver on par or even better results.

[1]  Peter J. Huber,et al.  Robust Statistics , 2005, Wiley Series in Probability and Statistics.

[2]  Edmund Y. S. Chao,et al.  Biomechanics of the hand : a basic research study , 1989 .

[3]  Takeo Kanade,et al.  Visual Tracking of High DOF Articulated Structures: an Application to Human Hand Tracking , 1994, ECCV.

[4]  Vladimir Pavlovic,et al.  Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Paulo R. S. Mendonça,et al.  Model-based 3D tracking of an articulated hand , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[6]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[8]  B. Ripley,et al.  Robust Statistics , 2018, Wiley Series in Probability and Statistics.

[9]  David W. Murray,et al.  Regression-based Hand Pose Estimation from Multiple Cameras , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[10]  Mircea Nicolescu,et al.  Vision-based hand pose estimation: A review , 2007, Comput. Vis. Image Underst..

[11]  Michael I. Miller,et al.  Pattern Theory: From Representation to Inference , 2007 .

[12]  Dinesh K. Pai,et al.  Musculotendon simulation for hand animation , 2008, ACM Trans. Graph..

[13]  Robert Y. Wang,et al.  Real-time hand-tracking with a color glove , 2009, ACM Trans. Graph..

[14]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[15]  ParagiosNikos,et al.  Model-Based 3D Hand Pose Estimation from Monocular Video , 2011 .

[16]  Andrew W. Fitzgibbon,et al.  Efficient regression of general-activity human poses from depth images , 2011, 2011 International Conference on Computer Vision.

[17]  Antonis A. Argyros,et al.  Efficient model-based 3D tracking of hand articulations using Kinect , 2011, BMVC.

[18]  Patrick van der Smagt,et al.  Human hand modelling: kinematics, dynamics, applications , 2012, Biological Cybernetics.

[19]  Jinxiang Chai,et al.  Combining marker-based mocap and RGB-D camera for acquiring high-fidelity hand motion data , 2012, SCA '12.

[20]  Lale Akarun,et al.  Hand Pose Estimation and Hand Shape Classification Using Multi-layered Randomized Decision Forests , 2012, ECCV.

[21]  Luc Van Gool,et al.  Motion Capture of Hands in Action Using Discriminative Salient Points , 2012, ECCV.

[22]  Georg Stillfried,et al.  Kinematic Modelling of the Human Hand for Robotics , 2015 .