Capturing Hand Motion with an RGB-D Sensor, Fusing a Generative Model with Salient Points

Hand motion capture has been an active research topic, following the success of full-body pose tracking. Despite similarities, hand tracking proves to be more challenging, characterized by a higher dimensionality, severe occlusions and self-similarity between fingers. For this reason, most approaches rely on strong assumptions, like hands in isolation or expensive multi-camera systems, that limit practical use. In this work, we propose a framework for hand tracking that can capture the motion of two interacting hands using only a single, inexpensive RGB-D camera. Our approach combines a generative model with collision detection and discriminatively learned salient points. We quantitatively evaluate our approach on 14 new sequences with challenging interactions.

[1]  Gérard G. Medioni,et al.  Object modeling by registration of multiple range images , 1991, Proceedings. 1991 IEEE International Conference on Robotics and Automation.

[2]  Luc Van Gool,et al.  Hough Forests for Object Detection, Tracking, and Action Recognition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Bodo Rosenhahn,et al.  Three-Dimensional Shape Knowledge for Joint Image Segmentation and Pose Tracking , 2007, International Journal of Computer Vision.

[4]  Antonis A. Argyros,et al.  Tracking the articulated motion of two strongly interacting hands , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Jovan Popović,et al.  Real-time hand-tracking with a color glove , 2009, SIGGRAPH 2009.

[6]  Danica Kragic,et al.  Grasp Recognition for Programming by Demonstration , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[7]  James M. Rehg,et al.  Statistical Color Models with Application to Skin Detection , 2004, International Journal of Computer Vision.

[8]  Jitendra Malik,et al.  Twist Based Acquisition and Tracking of Animal and Human Kinematics , 2004, International Journal of Computer Vision.

[9]  Nassir Navab,et al.  Adaptive neighborhood selection for real-time surface normal estimation from organized point cloud data using integral images , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10]  Björn Stenger,et al.  Shape context and chamfer matching in cluttered scenes , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[11]  Dimitrios Tzionas,et al.  A Comparison of Directional Distances for Hand Pose Estimation , 2013, GCPR.

[12]  Marc Levoy,et al.  Real-time 3D model acquisition , 2002, ACM Trans. Graph..

[13]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Luc Van Gool,et al.  Motion Capture of Hands in Action Using Discriminative Salient Points , 2012, ECCV.

[15]  Luc Van Gool,et al.  Tracking a hand manipulating an object , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[16]  John P. Lewis,et al.  Pose Space Deformation: A Unified Approach to Shape Interpolation and Skeleton-Driven Deformation , 2000, SIGGRAPH.

[17]  David W. Murray,et al.  Regression-based Hand Pose Estimation from Multiple Cameras , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[18]  Daniel P. Huttenlocher,et al.  Distance Transforms of Sampled Functions , 2012, Theory Comput..

[19]  Frédo Durand,et al.  A Fast Approximation of the Bilateral Filter Using a Signal Processing Approach , 2006, International Journal of Computer Vision.

[20]  Antonis A. Argyros,et al.  Full DOF tracking of a hand interacting with an object by modeling occlusions and physical constraints , 2011, 2011 International Conference on Computer Vision.

[21]  Antonis A. Argyros,et al.  Efficient model-based 3D tracking of hand articulations using Kinect , 2011, BMVC.

[22]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[23]  Stan Sclaroff,et al.  3D hand pose reconstruction using specialized mappings , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[24]  Mircea Nicolescu,et al.  Vision-based hand pose estimation: A review , 2007, Comput. Vis. Image Underst..

[25]  Luc Van Gool,et al.  Functional categorization of objects using real-time markerless motion capture , 2011, CVPR 2011.

[26]  Bodo Rosenhahn,et al.  Model-Based Pose Estimation , 2011, Visual Analysis of Humans.

[27]  Takeo Kanade,et al.  Visual Tracking of High DOF Articulated Structures: an Application to Human Hand Tracking , 1994, ECCV.

[28]  Patrick Olivier,et al.  Digits: freehand 3D interactions anywhere using a wrist-worn gloveless sensor , 2012, UIST.

[29]  Danica Kragic,et al.  Hands in action: real-time 3D reconstruction of hands in interaction with objects , 2010, 2010 IEEE International Conference on Robotics and Automation.

[30]  Hans-Peter Seidel,et al.  Eurographics/siggraph Symposium on Computer Animation (2003) Construction and Animation of Anatomically Based Human Hand Models , 2022 .

[31]  Mohammad Ali Nekouie 3D Human Hand Posture Reconstruction Using a Single 2D Image , 2011 .

[32]  Stan Sclaroff,et al.  Estimating 3D hand pose from a cluttered image , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[33]  Richard M. Murray,et al.  A Mathematical Introduction to Robotic Manipulation , 1994 .

[34]  Paulo R. S. Mendonça,et al.  Model-based 3D tracking of an articulated hand , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[35]  Toby Sharp,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR.

[36]  Marc Levoy,et al.  Efficient variants of the ICP algorithm , 2001, Proceedings Third International Conference on 3-D Digital Imaging and Modeling.

[37]  Danica Kragic,et al.  Monocular real-time 3D articulated hand pose estimation , 2009, 2009 9th IEEE-RAS International Conference on Humanoid Robots.

[38]  Antonis A. Argyros,et al.  Physically Plausible 3D Scene Tracking: The Single Actor Hypothesis , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Takeo Kanade,et al.  Model-based tracking of self-occluding articulated objects , 1995, Proceedings of IEEE International Conference on Computer Vision.

[40]  Luc Van Gool,et al.  An object-dependent hand pose prior from sparse training data , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[41]  Gabriel Zachmann,et al.  Collision Detection for Deformable Objects , 2004, Comput. Graph. Forum.

[42]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[43]  Jovan Popovic,et al.  Automatic rigging and animation of 3D characters , 2007, ACM Trans. Graph..

[44]  David C. Hogg,et al.  Towards 3D hand tracking using a deformable model , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[45]  Michael Isard,et al.  Partitioned Sampling, Articulated Objects, and Interface-Quality Hand Tracking , 2000, ECCV.

[46]  Antti Oulasvirta,et al.  Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data , 2013, 2013 IEEE International Conference on Computer Vision.

[47]  Lale Akarun,et al.  Hand Pose Estimation and Hand Shape Classification Using Multi-layered Randomized Decision Forests , 2012, ECCV.