Real-time framework for multimodal human-robot interaction

This paper presents a new framework for multimodal data processing in real-time. This framework comprises modules for different input and output signals and was designed for human-human or human-robot interaction scenarios. Single modules for the recording of selected channels like speech, gestures or mimics can be combined with different output options (i.e. robot reactions) in a highly flexible manner. Depending on the included modules, online as well as offline data processing is possible. This framework was used to analyze human-human interaction to gain insights on important factors and their dynamics. Recorded data comprises speech, facial expressions, gestures and physiological data. This naturally produced data was annotated and labeled in order to train recognition modules which will be integrated into the existing framework. The overall aim is to create a system that is able to recognize and react to those parameters that humans take into account during interaction. In this paper, the technical implementation and application in a human-human and a human-robot interaction scenario is presented.

[1]  Kerstin Dautenhahn,et al.  A quantitative technique for analysing robot-human interactions , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[2]  Stefan Kohlbecher,et al.  Gaze-based interaction in various environments , 2008, VNBA '08.

[3]  S. Kammel,et al.  Cooperative Cognitive Automobiles , 2007, 2007 IEEE Intelligent Vehicles Symposium.

[4]  M. Goebl,et al.  Interfaces for integrating cognitive functions into Intelligent Vehicles , 2008, 2008 IEEE Intelligent Vehicles Symposium.

[5]  Gerhard Rigoll,et al.  Surveillance and Activity Recognition with Depth Information , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[6]  Michael Thuy,et al.  Kognitive Automobile - Neue Konzepte und Ideen des Sonderforschungsbereiches TR-28 , 2008 .

[7]  Jean-Claude Martin,et al.  Collection and Annotation of a Corpus of Human-Human Multimodal Interactions: Emotion and Others Anthropomorphic Characteristics , 2007, ACII.

[8]  G. Rigoll,et al.  Event analysis and interpretation of human activity for augmented reality-based assistant systems , 2008, 2008 4th International Conference on Intelligent Computer Communication and Processing.

[9]  David S. Touretzky,et al.  Tekkotsu: A Framework for AIBO Cognitive Robotics , 2005, AAAI.

[10]  Michael Kipp,et al.  ANVIL - a generic annotation tool for multimodal dialogue , 2001, INTERSPEECH.

[11]  Gerhard Rigoll,et al.  Static and Dynamic Hand-Gesture Recognition for Augmented Reality Applications , 2007, HCI.

[12]  Mika Laaksonen,et al.  Skin detection in video under changing illumination conditions , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[13]  M. Goebl,et al.  A Real-Time-capable Hard-and Software Architecture for Joint Image and Knowledge Processing in Cognitive Automobiles , 2007, 2007 IEEE Intelligent Vehicles Symposium.

[14]  Michael F. Zäh,et al.  Towards the Cognitive Factory , 2007 .

[15]  Alois Knoll,et al.  Joint-action for humans and industrial robots for assembly tasks , 2008, RO-MAN 2008 - The 17th IEEE International Symposium on Robot and Human Interactive Communication.