EYEWATCHME—3D Hand and object tracking for inside out activity analysis

This paper investigates the “inside-out” recognition of everyday manipulation tasks using a gaze-directed camera, which is a camera that actively directs at the visual attention focus of the person wearing the camera. We present EYEWATCHME, an integrated vision and state estimation system that at the same time tracks the positions and the poses of the acting hands, the pose that the manipulated object, and the pose of the observing camera. Taken together, EYEWATCHME provides comprehensive data for learning predictive models of vision-guided manipulation that include the objects people are attending, the interaction of attention and reaching/grasping, and the segmentation of reaching and grasping using visual attention as evidence. Key technical contributions of this paper include an ego view hand tracking system that estimates 27 DOF hand poses. The hand tracking system is capable of detecting hands and estimating their poses despite substantial self-occlusion caused by the hand and occlusions caused by the manipulated object. EYEWATCHME can also cope with blurred images that are caused by rapid eye movements. The second key contribution is the of the integrated activity recognition system that simultaneously tracks the attention of the person, the hand poses, and the poses of the manipulated objects in terms of a global scene coordinates. We demonstrate the operation of EYEWATCHME in the context of kitchen tasks including filling a cup with water.

[1]  Markus Ulrich,et al.  Recognition and Tracking of 3D Objects , 2008, DAGM-Symposium.

[2]  Vladimir Pavlovic,et al.  Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Yuichi Ohta,et al.  Object Tracking and Object Change Detection in Desktop Manipulation for Video-Based Interactive Manuals , 2004, PCM.

[4]  Ying Wu,et al.  Capturing natural hand articulation , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[5]  Rogério Schmidt Feris,et al.  The isometric self-organizing map for 3D hand pose estimation , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[6]  Mathias Kölsch,et al.  Robust hand detection , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[7]  Paulo R. S. Mendonça,et al.  Model-Based Hand Tracking Using an Unscented Kalman Filter , 2001, BMVC.

[8]  Antonis A. Argyros,et al.  Dynamic time warping for binocular hand tracking and reconstruction , 2008, 2008 IEEE International Conference on Robotics and Automation.

[9]  Anil K. Jain,et al.  Face Detection in Color Images , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  T. Brandt,et al.  A third eye for the surgeon , 2006, Journal of Neurology, Neurosurgery & Psychiatry.

[11]  Nassir Navab,et al.  Edge-Based Template Matching and Tracking for Perspectively Distorted Planar Objects , 2008, ISVC.

[12]  Björn Stenger,et al.  Model-based hand tracking using a hierarchical Bayesian filter , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  David W. Murray,et al.  Wearable hand activity recognition for event summarization , 2005, Ninth IEEE International Symposium on Wearable Computers (ISWC'05).

[14]  M. Land,et al.  The Roles of Vision and Eye Movements in the Control of Activities of Daily Living , 1998, Perception.

[15]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[16]  Carlo Tomasi,et al.  3D tracking = classification + interpolation , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.