3D Hand and Object Tracking for Inside Out Activity Analysis

This paper investigates the “inside-out” recognition of everyday manipulation tasks using a gaze-directed camera, which is a camera that actively directs at the visual attention focus of the person wearing the camera. We present EYEWATCHME, an integrated vision and state estimation system that at the same time tracks the positions and the poses of the acting hands, the pose that the manipulated object, and the pose of the observing camera. Taken together, EYEWATCHME provides comprehensive data for learning predictive models of vision-guided manipulation that include the objects people are attending, the interaction of attention and reaching/grasping, and the segmentation of reaching and grasping using visual attention as evidence. Key technical contributions of this paper include an ego view hand tracking system that estimates 27 DOF hand poses. The hand tracking system is capable of detecting hands and estimating their poses despite substantial selfocclusion caused by the hand and occlusions caused by the manipulated object. EYEWATCHME can also cope with blurred images that are caused by rapid eye movements. The second key contribution is the of the integrated activity recognition system that simultaneously tracks the attention of the person, the hand poses, and the poses of the manipulated objects in terms of a global scene coordinates. We demonstrate the operation of EYEWATCHME in the context of kitchen tasks including filling a cup with water.

[1]  Vladimir Pavlovic,et al.  Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  M. Land,et al.  The Roles of Vision and Eye Movements in the Control of Activities of Daily Living , 1998, Perception.

[3]  Paulo R. S. Mendonça,et al.  Model-Based Hand Tracking Using an Unscented Kalman Filter , 2001, BMVC.

[4]  Ying Wu,et al.  Capturing natural hand articulation , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[5]  Anil K. Jain,et al.  Face Detection in Color Images , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Carlo Tomasi,et al.  3D tracking = classification + interpolation , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[7]  Mathias Kölsch,et al.  Robust hand detection , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[8]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[9]  Yuichi Ohta,et al.  Object Tracking and Object Change Detection in Desktop Manipulation for Video-Based Interactive Manuals , 2004, PCM.

[10]  David W. Murray,et al.  Wearable hand activity recognition for event summarization , 2005, Ninth IEEE International Symposium on Wearable Computers (ISWC'05).

[11]  Björn Stenger,et al.  Model-based hand tracking using a hierarchical Bayesian filter , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  T. Brandt,et al.  A third eye for the surgeon , 2006, Journal of Neurology, Neurosurgery & Psychiatry.

[13]  Antonis A. Argyros,et al.  Dynamic time warping for binocular hand tracking and reconstruction , 2008, 2008 IEEE International Conference on Robotics and Automation.

[14]  Markus Ulrich,et al.  Recognition and Tracking of 3D Objects , 2008, DAGM-Symposium.

[15]  Nassir Navab,et al.  Edge-Based Template Matching and Tracking for Perspectively Distorted Planar Objects , 2008, ISVC.