Handling uncertainty in multimodal pervasive computing applications

Multimodal interaction can improve accessibility to pervasive computing applications. However, recognition-based interaction techniques used in multimodal interfaces (e.g. speech and gesture recognition) are still error prone. Recognition errors and misinterpretations can compromise the security, robustness, and efficiency of pervasive computing applications. In this paper we briefly review the various error handling strategies that can be found in the multimodal literature. We then discuss the new challenges arising from novel affective and context-aware applications for error correction. We show that traditional multimodal error handling strategies are ill adapted to pervasive computing applications, where the computing devices have become invisible, and when users may not be aware of their own behaviour. Finally, we present an original experimental study into users' synchronisation of speech and pen inputs in error correction. The results of the study suggest that users are likely to modify their synchronisation patterns in the belief that it can help error resolution. This study is a first step towards a better understanding of spontaneous user strategies for error correction in multimodal interfaces and pervasive environments.

[1]  Elisabeth André,et al.  User-Centred Development of Mobile Interfaces to a Pervasive Computing Environment , 2008, First International Conference on Advances in Computer-Human Interaction.

[2]  Ashish Kapoor,et al.  Multimodal affect recognition in learning environments , 2005, ACM Multimedia.

[3]  Chris Baber,et al.  Modelling Error Recovery and Repair in Automatic Speech Recognition , 1993, Int. J. Man Mach. Stud..

[4]  Shrikanth Narayanan Towards modeling user behavior in human-machine interactions: Effect of Errors and Emotions , 2002 .

[5]  Gérard Chollet,et al.  Coupling Context Awareness and Multimodality in Smart Homes Concept , 2004, ICCHP.

[6]  Sharon L. Oviatt,et al.  Taming recognition errors with a multimodal interface , 2000, CACM.

[7]  Clare-Marie Karat,et al.  The Beauty of Errors: Patterns of Error Correction in Desktop Speech Systems , 1999, INTERACT.

[8]  Alexander H. Waibel,et al.  Towards Unrestricted Lip Reading , 2000, Int. J. Pattern Recognit. Artif. Intell..

[9]  Akio Ando,et al.  Synchronization of speech and hand gestures during multimodal human-computer interaction , 1998, CHI Conference Summary.

[10]  Sharon L. Oviatt,et al.  Toward interface design for human language technology: Modality and structure as determinants of linguistic complexity , 1994, Speech Communication.

[11]  Marie-Luce Bourguet,et al.  Towards a taxonomy of error-handling strategies in recognition-based multi-modal human-computer interfaces , 2006, Signal Process..

[12]  Thomas C. Gunter,et al.  The Role of Iconic Gestures in Speech Disambiguation: ERP Evidence , 2007, Journal of Cognitive Neuroscience.

[13]  Alexander H. Waibel,et al.  Multimodal error correction for speech user interfaces , 2001, TCHI.

[14]  Daniel B. Horn,et al.  Patterns of entry and correction in large vocabulary continuous speech recognition systems , 1999, CHI '99.

[15]  Sharon L. Oviatt,et al.  Error resolution during multimodal human-computer interaction , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[16]  Simon A. Dobson,et al.  More Principled Design of Pervasive Computing Systems , 2004, EHCI/DS-VIS.

[17]  Irfan Essa,et al.  Towards reliable multimodal sensing in aware environments , 2001, PUI '01.

[18]  Paul G. Bahn Charting the past , 2004, Nature.

[19]  C GunterThomas,et al.  The Role of Iconic Gestures in Speech Disambiguation , 2007 .

[20]  Sharon L. Oviatt,et al.  Ten myths of multimodal interaction , 1999, Commun. ACM.

[21]  Climent Nadeu,et al.  Automatic Speech Activity Detection, Source Localization, and Speech Recognition on the Chil Seminar Corpus , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[22]  Maja Pantic,et al.  Kernel-based Recognition of Human Actions Using Spatiotemporal Salient Points , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[23]  Dylan M. Jones,et al.  Decline in Accuracy of Automatic Speech Recognition as Function of Time on Task: Fatigue or Voice Drift? , 1992, Int. J. Man Mach. Stud..

[24]  Marie-Luce Bourguet,et al.  A Toolkit for Creating and Testing Multimodal Interface Designs , 2002 .

[25]  James A. Landay,et al.  SATIN: a toolkit for informal ink-based applications , 2000, UIST '00.

[26]  Antonella De Angeli,et al.  Integration and synchronization of input modes during multimodal human-computer interaction , 1997, CHI.

[27]  Jeffrey Heer,et al.  Presiding over accidents: system direction of human action , 2004, CHI.

[28]  Dimitrios Tzovaras,et al.  Multimodal signal processing and interaction for a driving simulator: Component-based architecture , 2008, Journal on Multimodal User Interfaces.

[29]  Hideyuki Tokuda,et al.  Active Authentication for Pervasive Computing Environments , 2002, ISSS.

[30]  Gregory D. Abowd,et al.  Charting past, present, and future research in ubiquitous computing , 2000, TCHI.

[31]  I. Patras,et al.  Spatiotemporal salient points for visual recognition of human actions , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[32]  Alexander H. Waibel,et al.  Model-based and empirical evaluation of multimodal interactive error correction , 1999, CHI '99.

[33]  Push Singh,et al.  The Public Acquisition of Commonsense Knowledge , 2002 .

[34]  Sharon Oviatt,et al.  Integration and synchronization of input modes during multimodal human-computer interaction , 1997 .