Midos: multimodal interactive dialogue system

Interactions between people are typically conversational, multimodal, and symmetric. In conversational interactions, information flows in both directions. In multimodal interactions, people use multiple channels. In symmetric interactions, both participants communicate multimodally, with the integration of and switching between modalities basically effortless. In contrast, consider typical human-computer interaction. It is almost always unidirectional—we're telling the machine what to do; it's almost always unimodal (can you type and use the mouse simultaneously?); and it's symmetric only in the disappointing sense that when you type, it types back at you. There are a variety of things wrong with this picture. Perhaps chief among them is that if communication is unidirectional, it must be complete and unambiguous, exhaustively anticipating every detail and every misinterpretation. In brief, it's exhausting. This thesis examines the benefits of creating multimodal human-computer dialogues that employ sketching and speech, aimed initially at the task of describing early stage designs of simple mechanical devices. The goal of the system is to be a collaborative partner, facilitating design conversations. Two initial user studies provided key insights into multimodal communication: simple questions are powerful, color choices are deliberate, and modalities are closely coordinated. These observations formed the basis for our multimodal interactive dialogue system, or MIDOS. MIDOS makes possible a dynamic dialogue, i.e., one in which it asks questions to resolve uncertainties or ambiguities. The benefits of a dialogue in reducing the cognitive overhead of communication have long been known. We show here that having the system able to ask questions is good, but for an unstructured task like describing a design, knowing what questions to ask is crucial. We describe an architecture that enables the system to accept partial information from the user, then request details it considers relevant, noticeably lowering the cognitive overhead of communicating. The multimodal questions MIDOS asks are in addition purposefully designed to use the same multimodal integration pattern that people exhibited in our study. Our evaluation of the system showed that MIDOS successfully engages the user in a dialogue and produces the same conversational features as our initial human-human conversation studies. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

[1]  Louis-Philippe Morency,et al.  The effect of head-nod recognition in human-robot conversation , 2006, HRI '06.

[2]  Edward C. Kaiser,et al.  Using redundant speech and handwriting for learning new vocabulary and understanding abbreviations , 2006, ICMI '06.

[3]  Christine J. Alvarado,et al.  A natural sketching environment : bringing the computer into early stages of mechanical design , 2000 .

[4]  David Craig,et al.  The importance of drawing in the mechanical design process , 1990, Comput. Graph..

[5]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[6]  Avis,et al.  ETCHA Sketches : Lessons Learned from Collecting Sketch Data , 2004 .

[7]  Antonella De Angeli,et al.  Integration and synchronization of input modes during multimodal human-computer interaction , 1997, CHI.

[8]  Randall Davis,et al.  Tahuti: a geometrical sketch recognition system for UML class diagrams , 2006, SIGGRAPH Courses.

[9]  Aaron Adler Segmentation and Alignment of Speech and Sketching in a Design Environment , 2003 .

[10]  Rong Jin,et al.  Linguistic theories in efficient multimodal reference resolution: an empirical investigation , 2005, IUI.

[11]  Brad A. Myers,et al.  Creating user interfaces by demonstration , 1988 .

[12]  Mary Ellen Foster,et al.  Assessing the Impact of Adaptive Generation in the COMIC Multimodal Dialogue System , 2005 .

[13]  A. Richard Newton,et al.  Recognition and beautification of multi-stroke symbols in digital ink , 2005, Comput. Graph..

[14]  Sharon L. Oviatt,et al.  A rapid semi-automatic simulation technique for investigating interactive speech and handwriting , 1992, ICSLP.

[15]  David L. Mills,et al.  Internet time synchronization: the network time protocol , 1991, IEEE Trans. Commun..

[16]  Sharon L. Oviatt,et al.  Designing the User Interface for Multimodal Speech and Pen-Based Gesture Applications: State-of-the-Art Systems and Future Research Directions , 2000, Hum. Comput. Interact..

[17]  Candace L. Sidner,et al.  COLLAGEN: A Collaboration Manager for Software Interface Agents , 1998, User Modeling and User-Adapted Interaction.

[18]  John McCarthy,et al.  SOME PHILOSOPHICAL PROBLEMS FROM THE STANDPOINT OF ARTI CIAL INTELLIGENCE , 1987 .

[19]  Randall Davis,et al.  LADDER, a sketching language for user interface developers , 2005, Comput. Graph..

[20]  Thomas F. Stahovich,et al.  Qualitative rigid-body mechanics , 1997, Artif. Intell..

[21]  Randall Davis,et al.  Sketch Understanding in Design: Overview of Work at the MIT AI Lab , 2002 .

[22]  Kenneth D. Forbus,et al.  Towards a computational model of sketching , 2001, IUI '01.

[23]  Michelle X. Zhou,et al.  A probabilistic approach to reference resolution in multimodal user interfaces , 2004, IUI '04.

[24]  John A. Bateman,et al.  Towards Constructive Text, Diagram, and Layout Generation for Information Presentation , 2001, Computational Linguistics.

[25]  Richard A. Bolt,et al.  “Put-that-there”: Voice and gesture at the graphics interface , 1980, SIGGRAPH '80.

[26]  Ivan E. Sutherland,et al.  Sketchpad: a man-machine graphical communication system , 1899, AFIPS '63 (Spring).

[27]  Peter Ashley,et al.  Operating modes in actual versus virtual paper-and-pencil design scenarios , 2009 .

[28]  Trevor Darrell,et al.  Untethered gesture acquisition and recognition for virtual world manipulation , 2005, Virtual Reality.

[29]  Randall Davis,et al.  Speech and sketching for multimodal design , 2004, IUI '04.

[30]  Arne Jönsson,et al.  Wizard of Oz studies: why and how , 1993, IUI '93.

[31]  Helen F. Hastie,et al.  Context-sensitive help for multimodal dialogue , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[32]  Manny Rayner,et al.  Adding intelligent help to mixed-initiative spoken dialogue systems , 2002, INTERSPEECH.

[33]  Oliver Lemon,et al.  Targeted help for spoken dialogue systems: intelligent feedback improves naive users' performance , 2003 .

[34]  Randall Davis,et al.  Recognition of Hand Drawn Chemical Diagrams , 2007, AAAI.

[35]  Paul Nielsen A Qualitative Approach to Mechanical Constraint , 1988, AAAI.

[36]  Glenn A. Kramer,et al.  Using degrees of freedom analysis to solve geometric constraint systems , 1991, SMA '91.

[37]  Mei-Yuh Hwang,et al.  The SPHINX-II speech recognition system: an overview , 1993, Comput. Speech Lang..

[38]  Thomas F. Stahovich,et al.  Combining Speech and Sketch to Interpret Unconstrained Descriptions of Mechanical Devices , 2009, IJCAI.

[39]  Mark Steedman,et al.  Animated conversation: rule-based generation of facial expression, gesture & spoken intonation for multiple conversational agents , 1994, SIGGRAPH.

[40]  James F. Allen,et al.  Towards Conversational Human-Computer Interaction , 2000 .

[41]  Alexander Gruenstein,et al.  Toward Widely-Available and Usable Multimodal Conversational Interfaces , 2009 .

[42]  Philip R. Cohen,et al.  QuickSet: multimodal interaction for distributed applications , 1997, MULTIMEDIA '97.

[43]  Sy Bor Wang,et al.  A multimodal galaxy-based geographic system , 2003 .

[44]  Edward C. Kaiser,et al.  Multimodal new vocabulary recognition through speech and handwriting in a whiteboard scheduling application , 2005, IUI.

[45]  Marilyn A. Walker,et al.  MATCH: An Architecture for Multimodal Dialogue Systems , 2002, ACL.

[46]  Joseph Polifroni,et al.  Recognition confidence scoring and its use in speech understanding systems , 2002, Comput. Speech Lang..

[47]  Peter Robinson,et al.  A Multimodal Interface for Road Design , 2009 .

[48]  James F. Allen,et al.  Toward Conversational Human-Computer Interaction , 2001, AI Mag..

[49]  James R. Glass,et al.  A Framework for Developing Conversational User Interfaces , 2004, CADUI.

[50]  Randall Davis,et al.  Naturally conveyed explanations of device behavior , 2001, PUI '01.

[51]  Steven K. Feiner,et al.  Negotiation for automated generation of temporal multimedia presentations , 1997, MULTIMEDIA '96.