Fast Tracking of Hand and Finger Articulations Using a Single Depth Camera

Using hand gestures as input in human–computer interaction is of everincreasing interest. Markerless tracking of hands and fingers is a promising enabler, but adoption has been hampered because of tracking problems, complex and dense capture setups, high computing requirements, equipment costs, and poor latency. In this paper, we present a method that addresses these issues. Our method tracks rapid and complex articulations of the hand using a single depth camera. It is fast (50 fps without GPU support) and supports varying close-range camera-to-scene arrangements, such as in desktop or egocentric settings, where the camera can even move. We frame pose estimation as an optimization problem in depth using a new objective function based on a collection of Gaussian functions, focusing particularly on robust tracking of finger articulations. We demonstrate the benefits of the method in several interaction applications ranging from manipulating objects in a 3D blocks world to egocentric interaction on the go. We also present extensive evaluation of our method on publicly available datasets which shows that our method achieves competitive accuracy.

[1]  Hans-Peter Seidel,et al.  A data-driven approach for real-time full body pose reconstruction from a depth camera , 2011, 2011 International Conference on Computer Vision.

[2]  Hans-Peter Seidel,et al.  Fast articulated motion tracking using a sums of Gaussians body model , 2011, 2011 International Conference on Computer Vision.

[3]  Jinxiang Chai,et al.  Robust realtime physics-based motion control for human grasping , 2013, ACM Trans. Graph..

[4]  Antonis A. Argyros,et al.  Full DOF tracking of a hand interacting with an object by modeling occlusions and physical constraints , 2011, 2011 International Conference on Computer Vision.

[5]  Antonis A. Argyros,et al.  Efficient model-based 3D tracking of hand articulations using Kinect , 2011, BMVC.

[6]  Srikanta J. Bedathur,et al.  Temporal index sharding for space-time efficiency in archive search , 2011, SIGIR.

[7]  Mircea Nicolescu,et al.  Vision-based hand pose estimation: A review , 2007, Comput. Vis. Image Underst..

[8]  Richard A. Bolt,et al.  “Put-that-there”: Voice and gesture at the graphics interface , 1980, SIGGRAPH '80.

[9]  Sterling Orsten,et al.  Dynamics based 3D skeletal hand tracking , 2013, I3D '13.

[10]  Gerhard Weikum,et al.  Scalable Phrase Mining for Ad-hoc Text Analytics , 2009 .

[11]  Sebastian Thrun,et al.  Real-Time Human Pose Tracking from Range Data , 2012, ECCV.

[12]  Antti Oulasvirta,et al.  Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data , 2013, 2013 IEEE International Conference on Computer Vision.

[13]  Reinhard Koch,et al.  Technical Foundation and Calibration Methods for Time-of-Flight Cameras , 2013, Time-of-Flight and Depth Imaging.

[14]  Gerhard Weikum,et al.  A Language Modeling Approach for Temporal Information Needs , 2010, ECIR.

[15]  Christian Theobalt,et al.  Monocular Pose Capture with a Depth Camera Using a Sums-of-Gaussians Body Model , 2013, GCPR.

[16]  Qionghai Dai,et al.  Video-based hand manipulation capture through composite motion control , 2013, ACM Trans. Graph..

[17]  Carsten Stoll Optical reconstruction of detailed animatable human body models , 2009 .

[18]  Oliver Grau,et al.  How Not to Be Seen - Inpainting Dynamic Objects in Crowded Scenes , 2011 .

[19]  Tom G. Zimmerman,et al.  A hand gesture interface device , 1987, CHI '87.

[20]  Gerhard Weikum,et al.  Query Relaxation for Entity-Relationship Search , 2011, ESWC.

[21]  Tae-Kyun Kim,et al.  Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests , 2013, 2013 IEEE International Conference on Computer Vision.

[22]  Edgar Sim,et al.  Kinematic Model of the Hand using Computer Vision , 2011 .

[23]  Sylvain Paris,et al.  6D hands: markerless hand-tracking for computer aided design , 2011, UIST.

[24]  Lale Akarun,et al.  Real time hand pose estimation using depth sensors , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[25]  David Zeltzer,et al.  A survey of glove-based input , 1994, IEEE Computer Graphics and Applications.

[26]  Martin Theobald,et al.  Top-k query processing in probabilistic databases with non-materialized views , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[27]  Jovan Popovic,et al.  Real-time hand-tracking with a color glove , 2009, SIGGRAPH '09.

[28]  Fabian M. Suchanek,et al.  Integrating YAGO into the Suggested Upper Merged Ontology , 2008, 2008 20th IEEE International Conference on Tools with Artificial Intelligence.

[29]  Sivan Toledo,et al.  Characterizing the Performance of Flash Memory Storage Devices and Its Impact on Algorithm Design , 2008, WEA.

[30]  Stan Sclaroff,et al.  Estimating 3D hand pose from a cluttered image , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[31]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[32]  Li Cheng,et al.  Efficient Hand Pose Estimation from a Single Depth Image , 2013, 2013 IEEE International Conference on Computer Vision.

[33]  Luc Van Gool,et al.  Motion Capture of Hands in Action Using Discriminative Salient Points , 2012, ECCV.