Implementation and evaluation of a constraint-based multimodal fusion system for speech and 3D pointing gestures

This paper presents an architecture for fusion of multimodal input streams for natural interaction with a humanoid robot as well as results from a user study with our system. The presented fusion architecture consists of an application independent parser of input events, and application specific rules. In the presented user study, people could interact with a robot in a kitchen scenario, using speech and gesture input. In the study, we could observe that our fusion approach is very tolerant against falsely detected pointing gestures. This is because we use speech as the main modality and pointing gestures mainly for disambiguation of objects. In the paper we also report about the temporal correlation of speech and gesture events as observed in the user study.

[1]  Norbert Reithinger,et al.  SmartKom: adaptive and flexible multimodal access to multiple applications , 2003, ICMI '03.

[2]  Philip R. Cohen,et al.  A map-based system using speech and 3D gestures for pervasive computing , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[3]  Karsten Berns,et al.  Control of ARMAR for the Realization of Anthropomorphic Motion Patterns , 2001 .

[4]  Alexander H. Waibel,et al.  Tight coupling of speech recognition and dialog management - dialog-context dependent grammar weighting for speech recognition , 2004, INTERSPEECH.

[5]  Richard A. Bolt,et al.  “Put-that-there”: Voice and gesture at the graphics interface , 1980, SIGGRAPH '80.

[6]  A. Waibel,et al.  A one-pass decoder based on polymorphic linguistic context assignment , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[7]  Matthias Denecke Rapid Prototyping for Spoken Dialogue Systems , 2002, COLING.

[8]  Marilyn A. Walker,et al.  MATCH: An Architecture for Multimodal Dialogue Systems , 2002, ACL.

[9]  Ipke Wachsmuth,et al.  Communicative Rhythm in Gesture and Speech , 1999, Gesture Workshop.

[10]  Michael Johnston,et al.  Unification-based Multimodal Parsing , 1998, ACL.

[11]  Mohammed Yeasin,et al.  Prosody based co-analysis for continuous recognition of coverbal gestures , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[12]  Antonella De Angeli,et al.  Integration and synchronization of input modes during multimodal human-computer interaction , 1997, CHI.

[13]  Jacob Eisenstein,et al.  A Salience-Based Approach to Gesture-Speech Alignment , 2004, HLT-NAACL.

[14]  Klaus Ries,et al.  The Karlsruhe-Verbmobil speech recognition engine , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Manpreet Kaur,et al.  Where is "it"? Event Synchronization in Gaze-Speech Input Systems , 2003, ICMI '03.

[16]  Bob Carpenter,et al.  The logic of typed feature structures , 1992 .

[17]  William K. Durfee,et al.  IEEE/RSJ/GI International Conference on Intelligent Robots and Systems , 1994 .

[18]  Sharon L. Oviatt,et al.  Multimodal Integration - A Statistical View , 1999, IEEE Trans. Multim..

[19]  Rainer Stiefelhagen,et al.  Pointing gesture recognition based on 3D-tracking of face, hands and head orientation , 2003, ICMI '03.

[20]  HumanComputerInteractionInstitute Schoolof ComputerScience OBJECT-ORIENTED TECHNIQUES IN GRAMMAR AND ONTOLOGY SPECIFICATION , 2000 .

[21]  木村 和夫 Pragmatics , 1997, Language Teaching.

[22]  Vladimir Pavlovic,et al.  Toward multimodal human-computer interface , 1998, Proc. IEEE.

[23]  Alexander H. Waibel,et al.  Natural human-robot interaction using speech, head pose and gestures , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[24]  Philip R. Cohen,et al.  Multimodal interaction during multiparty dialogues: initial results , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[25]  A BoltRichard,et al.  Put-that-there , 1980 .

[26]  Sharon L. Oviatt,et al.  Unification-based Multimodal Integration , 1997, ACL.