Accurate, Robust, and Flexible Real-time Hand Tracking

We present a new real-time hand tracking system based on a single depth camera. The system can accurately reconstruct complex hand poses across a variety of subjects. It also allows for robust tracking, rapidly recovering from any temporary failures. Most uniquely, our tracker is highly flexible, dramatically improving upon previous approaches which have focused on front-facing close-range scenarios. This flexibility opens up new possibilities for human-computer interaction with examples including tracking at distances from tens of centimeters through to several meters (for controlling the TV at a distance), supporting tracking using a moving depth camera (for mobile scenarios), and arbitrary camera placements (for VR headsets). These features are achieved through a new pipeline that combines a multi-layered discriminative reinitialization strategy for per-frame pose estimation, followed by a generative model-fitting stage. We provide extensive technical details and a detailed qualitative and quantitative analysis.

[1]  Antonis A. Argyros,et al.  Tracking the articulated motion of two strongly interacting hands , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Antti Oulasvirta,et al.  Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data , 2013, 2013 IEEE International Conference on Computer Vision.

[3]  Paulo R. S. Mendonça,et al.  Model-based 3D tracking of an articulated hand , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[4]  A. Yuille,et al.  Opinion TRENDS in Cognitive Sciences Vol.10 No.7 July 2006 Special Issue: Probabilistic models of cognition Vision as Bayesian inference: analysis by synthesis? , 2022 .

[5]  Li Cheng,et al.  Efficient Hand Pose Estimation from a Single Depth Image , 2013, 2013 IEEE International Conference on Computer Vision.

[6]  Antonis A. Argyros,et al.  Full DOF tracking of a hand interacting with an object by modeling occlusions and physical constraints , 2011, 2011 International Conference on Computer Vision.

[7]  Antonis A. Argyros,et al.  Efficient model-based 3D tracking of hand articulations using Kinect , 2011, BMVC.

[8]  Sterling Orsten,et al.  Dynamics based 3D skeletal hand tracking , 2013, I3D '13.

[9]  David J. Fleet,et al.  Model-Based 3D Hand Pose Estimation from Monocular Video , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Sylvain Paris,et al.  6D hands: markerless hand-tracking for computer aided design , 2011, UIST.

[11]  Varun Ramakrishna,et al.  User-Specific Hand Modeling from Monocular Depth Sequences , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Jinxiang Chai,et al.  Combining marker-based mocap and RGB-D camera for acquiring high-fidelity hand motion data , 2012, SCA '12.

[13]  Luc Van Gool,et al.  Motion Capture of Hands in Action Using Discriminative Salient Points , 2012, ECCV.

[14]  Patrick Olivier,et al.  Digits: freehand 3D interactions anywhere using a wrist-worn gloveless sensor , 2012, UIST.

[15]  Tae-Kyun Kim,et al.  Latent Regression Forest: Structured Estimation of 3D Articulated Hand Posture , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Tae-Kyun Kim,et al.  Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests , 2013, 2013 IEEE International Conference on Computer Vision.

[17]  Chen Qian,et al.  Realtime and Robust Hand Tracking from Depth , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Ying Wu,et al.  View-independent recognition of hand postures , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[19]  Mircea Nicolescu,et al.  Vision-based hand pose estimation: A review , 2007, Comput. Vis. Image Underst..

[20]  Andrew W. Fitzgibbon,et al.  Efficient regression of general-activity human poses from depth images , 2011, 2011 International Conference on Computer Vision.

[21]  Sebastian Nowozin,et al.  Decision Jungles: Compact and Rich Models for Classification , 2013, NIPS.

[22]  Toby Sharp,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR.

[23]  Antonio Criminisi,et al.  Decision Forests for Computer Vision and Medical Image Analysis , 2013, Advances in Computer Vision and Pattern Recognition.

[24]  Chia-Feng Juang,et al.  A hybrid of genetic algorithm and particle swarm optimization for recurrent network design , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[25]  Jovan Popovic,et al.  Real-time hand-tracking with a color glove , 2009, SIGGRAPH '09.

[26]  Ying Wu,et al.  Capturing natural hand articulation , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[27]  Stan Sclaroff,et al.  Estimating 3D hand pose from a cluttered image , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[28]  Andrew W. Fitzgibbon,et al.  The Vitruvian manifold: Inferring dense correspondences for one-shot human pose estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Benjamin Klein,et al.  Discriminative Ferns Ensemble for Hand Pose Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Ken Perlin,et al.  Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks , 2014, ACM Trans. Graph..

[31]  Paolo Dario,et al.  A Survey of Glove-Based Systems and Their Applications , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[32]  Lale Akarun,et al.  Hand Pose Estimation and Hand Shape Classification Using Multi-layered Randomized Decision Forests , 2012, ECCV.

[33]  Min Sun,et al.  Conditional regression forests for human pose estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  David C. Hogg,et al.  Towards 3D hand tracking using a deformable model , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[35]  Luc Van Gool,et al.  Smart particle filtering for 3D hand tracking , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[36]  Qionghai Dai,et al.  Video-based hand manipulation capture through composite motion control , 2013, ACM Trans. Graph..