Manipulative Action Recognition for Human- Robot Interaction

Recently, human-robot interaction is receiving more and more interest in the robotics as well as in the computer vision research community. From the robotics perspective, robots that cooperate with humans are an interesting application field that is expected to have a high future market potential. A couple of global and also mid-sized companies have come up with quite sophisticated robotic platforms that are designed for human-robot interaction. The ultimate goal is to place some robotic assistant or companion in the regular home environment of people, who would be able to communicate with the robot in a human-like fashion. As a consequence, the “hearing” as well as the “seeing” -- as the most prominent and equally important modalities -- are becoming major research issues. From the computer vision perspective, robot perception is more than an interesting application field. During the last decades, we can note a shift from solving isolated vision problems to modeling visual processing as an integral connected component in a cognitive system. This change in perspective pays tribute to important aspects of understanding dynamic visual scenes, such as attention, domain and task knowledge, spatio-temporal context as well as a functional view of object categorization. The visual recognition of human actions is in the center of all these aspects and provides a bridge for a non-verbal as well as verbal communication between a human and the robot, which both are highly ambiguous. It enables the robot's anticipation of human actions leading to a pro-active robot behavior especially in passive, more observational situations. Furthermore, it draws attention to manipulated objects or places, embeds objects in functional as well as task contexts, and focuses on the spatio-temporal dynamics in the scene.

[1]  Irfan A. Essa,et al.  Exploiting human actions and object context for recognition tasks , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[2]  Alex Pentland,et al.  Real-time American Sign Language recognition from video using hidden Markov models , 1995 .

[3]  Aaron F. Bobick,et al.  Recognizing Planned, Multiperson Action , 2001, Comput. Vis. Image Underst..

[4]  A F Bobick,et al.  Movement, activity and action: the role of knowledge in the perception of motion. , 1997, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[5]  Vladimir Pavlovic,et al.  Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[7]  Sebastian Lang,et al.  Providing the basis for human-robot-interaction: a multi-modal attention system for a mobile robot , 2003, ICMI '03.

[8]  Chen Yu,et al.  A multimodal learning interface for word acquisition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[9]  Yasushi Nakauchi,et al.  Time Series Action Support by Mobile Robot in Intelligent Environment , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[10]  Jannik Fritsch,et al.  Combining sensory and symbolic data for manipulative gesture recognition , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[11]  Stan Sclaroff,et al.  Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning , 2005, ICCV-HCI.

[12]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[13]  Chen Yu,et al.  Learning to recognize human action sequences , 2002, Proceedings 2nd International Conference on Development and Learning. ICDL 2002.

[14]  Deb Roy,et al.  Mining temporal patterns of movement for video content classification , 2006, MIR '06.

[15]  Jannik Fritsch,et al.  Vision based recognition of gestures with context , 2003 .

[16]  Goro Obinata,et al.  Vision Systems: Segmentation and Pattern Recognition , 2007 .

[17]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[18]  Anthony Hoogs,et al.  Detecting rare events in video using semantic primitives with HMM , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[19]  Chrystopher L. Nehaniv Classifying types of gesture and inferring intent , 2005 .

[20]  Jannik Fritsch,et al.  Kernel particle filter for real-time 3D body tracking in monocular color images , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[21]  Jin-Hyung Kim,et al.  An HMM-Based Threshold Model Approach for Gesture Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Peter Morguet,et al.  Spotting dynamic hand gestures in video image sequences using hidden Markov models , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[23]  Volume Assp,et al.  ACOUSTICS. SPEECH. AND SIGNAL PROCESSING , 1983 .

[24]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[25]  Sven Wachsmuth,et al.  An Object-Oriented Approach Using a Top-Down and Bottom-Up Process for Manipulative Action Recognition , 2006, DAGM-Symposium.

[26]  Sven Wachsmuth,et al.  Bayesian networks for speech and image integration , 2002, AAAI/IAAI.

[27]  Kang-Hyun Jo,et al.  Manipulative hand gesture recognition using task knowledge for human computer interaction , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[28]  Jannik Fritsch,et al.  Hierarchical Modeling and Recognition of Manipulative Gesture , 2005, ICCV 2005.

[29]  Sebastian Lang,et al.  BIRON - The Bielefeld Robot Companion , 2004 .

[30]  Michael Fleischman,et al.  Why Verbs are Harder to Learn than Nouns: Initial Insights from a Computational Model of Intention Recognition in Situated Word Learning , 2005 .

[31]  Michael J. Black,et al.  A Probabilistic Framework for Matching Temporal Trajectories: CONDENSATION-Based Recognition of Gestures and Expressions , 1998, ECCV.

[32]  Claudio S. Pinhanez,et al.  Human action detection using PNF propagation of temporal constraints , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).