Linking Speech and Gesture in Multimodal Instruction Systems

This paper analyses the timing of gesture and speech acts in a corpus (MIBL) of free-flowing human-to-human instruction dialogues. From there, an algorithm is proposed to establish the pairing between speech and gesture of the instructor. It is shown that correct pairing requires timing and semantic information. Further work will explore the use of this algorithm in unconstrained free flowing multimodal instruction dialogues between human and robot. A brief overview of a robotic system is given, that is able to learn a card game from a human teacher

[1]  Deb Roy,et al.  Semiotic schemas: A framework for grounding language in action and perception , 2005, Artif. Intell..

[2]  Anders Green,et al.  Developing a ContextualizedMultimodal Corpus for Human-Robot Interaction , 2006, LREC.

[3]  Brendan J. Frey,et al.  Probabilistic multimedia objects (multijects): a novel approach to video indexing and retrieval in multimedia systems , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[4]  Guido Bugmann,et al.  Multimodal Corpus Collection for the Design of User-Programmable Robots , 2004 .

[5]  G. Bugmann,et al.  Timing of visual and spoken input in robot instructions , 2006 .

[6]  Guido Bugmann,et al.  The impact of spoken interfaces on the design of service robots , 2005, Ind. Robot.

[7]  Alan C. Schultz,et al.  Towards Seamless Integration in a Multi-modal Interface , 2000 .

[8]  Alan C. Schultz,et al.  Integrating natural language and gesture in a robotics domain , 1998, Proceedings of the 1998 IEEE International Symposium on Intelligent Control (ISIC) held jointly with IEEE International Symposium on Computational Intelligence in Robotics and Automation (CIRA) Intell.

[9]  Guido Bugmann,et al.  Corpus-Based Robotics: A Route Instruction Example , 2003 .