Unification-based Multimodal Integration

Recent empirical research has shown conclusive advantages of multimodal interaction over speech-only interaction for map-based tasks. This paper describes a multimodal language processing architecture which supports interfaces allowing simultaneous input from speech and gesture recognition. Integration of spoken and gestural input is driven by unification of typed feature structures representing the semantic contributions of the different modes. This integration method allows the component modalities to mutually compensate for each others' errors. It is implemented in Quick-Set, a multimodal (pen/voice) system that enables users to set up and control distributed interactive simulations.

[1]  Michael Moshier,et al.  Extensions to unification grammar for the description of programming languages , 1988 .

[2]  Yi-Ping Hung,et al.  Integrating virtual objects into real images for augmented reality , 1998, VRST '98.

[3]  Adam Cheyer,et al.  Multimodal Maps: An Agent-Based Approach , 1995, Multimodal Human-Computer Communication.

[4]  Richard A. Bolt,et al.  “Put-that-there”: Voice and gesture at the graphics interface , 1980, SIGGRAPH '80.

[5]  Wolfgang Wahlster,et al.  User and discourse models for multimodal communication , 1991 .

[6]  Adam Cheyer,et al.  CommandTalk: A Spoken-Language Interface for Battlefield Simulations , 1997, ANLP.

[7]  Philip R. Cohen The role of natural language in a multimodal interface , 1992, UIST '92.

[8]  Ivan A. Sag,et al.  Information-based syntax and semantics , 1987 .

[9]  Minh Tue Vo,et al.  Building an application framework for speech and pen input integration in multimodal learning interfaces , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[10]  Kenneth Wauchope,et al.  Eucalyptus: Integrating Natural Language Input with a Graphical User Interface , 1994 .

[11]  Bob Carpenter,et al.  The logic of typed feature structures , 1992 .

[12]  A. Ceranowicz,et al.  ModSAF Development Status , 1995 .

[13]  Sharon L. Oviatt,et al.  Error resolution during multimodal human-computer interaction , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[14]  Sharon L. Oviatt,et al.  Multimodal interfaces for dynamic interactive maps , 1996, CHI.

[15]  Philip R. Cohen Integrated interfaces for decision-support with simulation , 1991, 1991 Winter Simulation Conference Proceedings..

[16]  Stuart C. Shapiro,et al.  Intelligent Multi-Media Interface Technology , 1988, SGCH.

[17]  Kristinn R. Thórisson,et al.  Integrating Simultaneous Input from Speech, Gaze, and Hand Gestures , 1991, AAAI Workshop on Intelligent Multimedia Interfaces.

[18]  Antonella De Angeli,et al.  Integration and synchronization of input modes during multimodal human-computer interaction , 1997, CHI.