Mutual disambiguation of recognition errors in a multimodel architecture

As a new generation of multimodal/media systems begins to defineitself, researchers are attempting to learn how to combinedifferent modes into strategically integrated whole systems. Intheory, well designed multimodal systems should be able tointegrate complementary modalities in a manner that supports mutualdisambiguation (MD) of errors and leads to more robust performance.In this study, over 2,000 multimodal utterances by both native andaccented speakers of English were processed by a multimodal system,and then logged and analyzed. The results confirmed that multimodalsystems can indeed support significant levels of MD, and alsohigher levels of MD for the more challenging accented users. As aresult, although speech recognition as a stand-alone performed farmore poorly for accented speakers, their multimodal recognitionrates did not differ from those of native speakers. Implicationsare discussed for the development of future multimodalarchitectures that can perform in a more robust and stable mannerthan individual recognition technologies. Also discussed is thedesign of interfaces that support diversity in tangible ways, andthat function well under challenging real-world usageconditions,

[1]  Sharon Oviatt,et al.  Multimodal interactive maps: designing for human performance , 1997 .

[2]  Bob Carpenter,et al.  The logic of typed feature structures , 1992 .

[3]  Sharon L. Oviatt,et al.  STAMP: a suite of tools for analyzing multimodal system processing , 1998, ICSLP.

[4]  Antonella De Angeli,et al.  Integration and synchronization of input modes during multimodal human-computer interaction , 1997, CHI.

[5]  Philip R. Cohen,et al.  QuickSet: multimodal interaction for distributed applications , 1997, MULTIMEDIA '97.

[6]  S Oviatt,et al.  Linguistic Adaptations During Spoken and Multimodal Error Resolution , 1998, Language and speech.

[7]  Sharon L. Oviatt,et al.  Ten myths of multimodal interaction , 1999, Commun. ACM.

[8]  Sharon L. Oviatt,et al.  Integration themes in multimodal human-computer interaction , 1994, ICSLP.

[9]  Philip R. Cohen,et al.  Synergistic use of direct manipulation and natural language , 1989, CHI '89.

[10]  Sharon L. Oviatt,et al.  Unification-based Multimodal Integration , 1997, ACL.

[11]  Stuart C. Shapiro,et al.  Intelligent Multi-Media Interface Technology , 1988, SGCH.

[12]  Kristinn R. Thórisson,et al.  Integrating Simultaneous Input from Speech, Gaze, and Hand Gestures , 1991, AAAI Workshop on Intelligent Multimedia Interfaces.

[13]  Sharon L. Oviatt,et al.  Toward interface design for human language technology: Modality and structure as determinants of linguistic complexity , 1994, Speech Communication.

[14]  Richard A. Bolt,et al.  “Put-that-there”: Voice and gesture at the graphics interface , 1980, SIGGRAPH '80.

[15]  Sharon L. Oviatt,et al.  Referential features and linguistic indirection in multimodal language , 1998, ICSLP.