Optimization in Multimodal Interpretation

In a multimodal conversation, the way users communicate with a system depends on the available interaction channels and the situated context (e.g., conversation focus, visual feedback). These dependencies form a rich set of constraints from various perspectives such as temporal alignments between different modalities, coherence of conversation, and the domain semantics. There is strong evidence that competition and ranking of these constraints is important to achieve an optimal interpretation. Thus, we have developed an optimization approach for multimodal interpretation, particularly for interpreting multimodal references. A preliminary evaluation indicates the effectiveness of this approach, especially for complex user inputs that involve multiple referring expressions in a speech utterance and multiple gestures.

[1]  Michelle X. Zhou,et al.  A probabilistic approach to reference resolution in multimodal user interfaces , 2004, IUI '04.

[2]  Reinhard Blutner,et al.  Some Aspects of Optimality in Natural Language Interpretation , 2000, J. Semant..

[3]  Steven Gold,et al.  A Graduated Assignment Algorithm for Graph Matching , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  P. Smolensky,et al.  Optimality Theory: Constraint Interaction in Generative Grammar , 2004 .

[5]  Kristinn R. Thórisson,et al.  Integrating Simultaneous Input from Speech, Gaze, and Hand Gestures , 1991, AAAI Workshop on Intelligent Multimedia Interfaces.

[6]  Sharon L. Oviatt,et al.  Multimodal Integration - A Statistical View , 1999, IEEE Trans. Multim..

[7]  Jason Eisner,et al.  Eecient Generation in Primitive Optimality Theory , 1997 .

[8]  Sharon L. Oviatt,et al.  Unification-based Multimodal Integration , 1997, ACL.

[9]  Shimei Pan,et al.  Context-based multimodal input understanding in conversational systems , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[10]  Sharon L. Oviatt,et al.  Multimodal interfaces for dynamic interactive maps , 1996, CHI.

[11]  Philip R. Cohen,et al.  QuickSet: multimodal interaction for distributed applications , 1997, MULTIMEDIA '97.

[12]  Stuart C. Shapiro,et al.  Intelligent multi-media interface technology , 1991 .

[13]  Pengyu Hong,et al.  Performance Evaluation and Error Analysis for Multimodal Reference Resolution in a Conversation System , 2004, NAACL.

[14]  Sharon L. Oviatt,et al.  Toward a theory of organized multimodal integration patterns during human-computer interaction , 2003, ICMI '03.

[15]  Antonella De Angeli,et al.  Integration and synchronization of input modes during multimodal human-computer interaction , 1997, CHI.

[16]  Amanda Stent,et al.  The CommandTalk Spoken Dialogue System , 1999, ACL.

[17]  Michael Johnston,et al.  Finite-state Multimodal Parsing and Understanding , 2000, COLING.

[18]  K. Chang,et al.  Embodiment in conversational interfaces: Rea , 1999, CHI '99.

[19]  Marilyn A. Walker,et al.  MATCH: An Architecture for Multimodal Dialogue Systems , 2002, ACL.

[20]  Michael Johnston,et al.  Unification-based Multimodal Parsing , 1998, ACL.

[21]  King-Sun Fu,et al.  Error-Correcting Isomorphisms of Attributed Relational Graphs for Pattern Analysis , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[22]  Andrew Kehler,et al.  Cognitive Status and Form of Reference in Multimodal Human-Computer Interaction , 2000, AAAI/IAAI.

[23]  Richard A. Bolt,et al.  “Put-that-there”: Voice and gesture at the graphics interface , 1980, SIGGRAPH '80.

[24]  Wolfgang Wahlster,et al.  User and discourse models for multimodal communication , 1991 .

[25]  Jens Edlund,et al.  Adapt - a multimodal conversational dialogue system in an apartment domain , 2000, INTERSPEECH.