Monocular real-time 3D articulated hand pose estimation

Markerless, vision based estimation of human hand pose over time is a prerequisite for a number of robotics applications, such as learning by demonstration (LbD), health monitoring, teleoperation, human-robot interaction. It has special interest in humanoid platforms, where the number of degrees of freedom makes conventional programming challenging. Our primary application is LbD in natural environments where the humanoid robot learns how to grasp and manipulate objects by observing a human performing a task. This paper presents a method for continuous vision based estimation of human hand pose. The method is non-parametric, performing a nearest neighbor search in a large database (100000 entries) of hand pose examples. The main contribution is a real time system, robust to partial occlusions and segmentation errors, that provides full hand pose recognition from markerless data. An additional contribution is the modeling of constraints based on temporal consistency in hand pose, without explicitly tracking the hand in the high dimensional pose space. The pose representation is rich enough to enable a descriptive human-to-robot mapping. Experiments show the pose estimation to be more robust and accurate than a non-parametric method without temporal constraints.

[1]  Katsushi Ikeuchi,et al.  Grasp Recognition Using The Contact Web , 1992, Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems.

[2]  Vladimir Pavlovic,et al.  Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Chrystopher L. Nehaniv,et al.  Sensory-Motor Primitives as a Basis for Imitation: Linking Perception to Action and Biology to Robotics , 2002 .

[4]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5]  Matei T. Ciocarlie,et al.  Biomimetic grasp planning for cortical control of a robotic hand , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[6]  Thad Starner,et al.  Visual Recognition of American Sign Language Using Hidden Markov Models. , 1995 .

[7]  Christian S. Jensen,et al.  Nearest neighbor and reverse nearest neighbor queries for moving objects , 2002, Proceedings International Database Engineering and Applications Symposium.

[8]  Manolis I. A. Lourakis,et al.  Real-Time Tracking of Multiple Skin-Colored Objects with a Possibly Moving Camera , 2004, ECCV.

[9]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[10]  Christian S. Jensen,et al.  Nearest and reverse nearest neighbor queries for moving objects , 2006, The VLDB Journal.

[11]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[12]  Michael I. Mandel,et al.  Visual Hand Tracking Using Nonparametric Belief Propagation , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[13]  Maja J. Matarić,et al.  Sensory-motor primitives as a basis for imitation: linking perception to action and biology to robotics , 2002 .

[14]  Mark R. Cutkosky,et al.  On grasp choice, grasp models, and the design of hands for manufacturing tasks , 1989, IEEE Trans. Robotics Autom..

[15]  Mircea Nicolescu,et al.  A Review on Vision-Based Full DOF Hand Motion Estimation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[16]  Odest Chadwicke Jenkins,et al.  Neighborhood denoising for learning high-dimensional grasping manifolds , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  David J. Fleet,et al.  Model-based hand tracking with texture, shading and self-occlusions , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Danica Kragic,et al.  Grasp Recognition for Programming by Demonstration , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[19]  Tom M. Mitchell,et al.  Feature selection for grasp recognition from optical markers , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[20]  Danica Kragic,et al.  Visual recognition of grasps for human-to-robot mapping , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21]  N. Kamakura,et al.  Patterns of static prehension in normal hands. , 1980, The American journal of occupational therapy : official publication of the American Occupational Therapy Association.

[22]  Stan Sclaroff,et al.  Estimating 3D hand pose from a cluttered image , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..