Using confidence scores to improve hands-free speech based navigation in continuous dictation systems

Speech recognition systems have improved dramatically, but recent studies confirm that error correction activities still account for 66--75% of the users' time, and 50% of that time is spent just getting to the errors that need to be corrected. While researchers have suggested that confidence scores could prove useful during the error correction process, the focus is typically on error detection. More importantly, empirical studies have failed to confirm any measurable benefits when confidence scores are used in this way within dictation-oriented applications. In this article, we provide data that explains why confidence scores are unlikely to be useful for error detection. We propose a new navigation technique for use when speech-only interactions are strongly preferred and common, desktop-sized displays are available. The results of an empirical study that highlights the potential of this new technique are reported. An informal comparison between the current study and previous research suggests the new technique reduces time spent on navigation by 18%. Future research should include additional studies that compare the proposed technique to previous non-speech and speech-based navigation solutions.

[1]  Timothy J. Hazen,et al.  Recognition Confidence Scoring for Use in Speech Understanding Systems , 2000 .

[2]  Clare-Marie Karat,et al.  Hands-Free, Speech-Based Navigation During Dictation: Difficulties, Consequences, and Solutions , 2003, Hum. Comput. Interact..

[3]  Hsiao-Wuen Hon,et al.  Word-based acoustic confidence measures for large-vocabulary speech recognition , 1998, ICSLP.

[4]  Daniel B. Horn,et al.  Patterns of entry and correction in large vocabulary continuous speech recognition systems , 1999, CHI '99.

[5]  Victor Zue,et al.  The use of dynamic reliability scoring in speech recognition , 2000, INTERSPEECH.

[6]  Louis Boves,et al.  Weighting phone confidence measures for automatic speech recognition , 2000 .

[7]  Benoît Maison,et al.  Robust confidence annotation and rejection for continuous speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[8]  Joseph Polifroni,et al.  Recognition confidence scoring and its use in speech understanding systems , 2002, Comput. Speech Lang..

[9]  Alexander H. Waibel,et al.  Improving recognizer acceptance through robust, natural speech repair , 1994, ICSLP.

[10]  Rafid A. Sukkar,et al.  Correcting recognition errors via discriminative utterance verification , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[11]  Lou Boves,et al.  Incorporating confidence measures in the Dutch train timetable information system developed in the ARISE project , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[12]  Catalina Danis,et al.  Storywriter: a speech oriented editor , 1994, CHI '94.

[13]  Sharon L. Oviatt,et al.  Designing the User Interface for Multimodal Speech and Pen-Based Gesture Applications: State-of-the-Art Systems and Future Research Directions , 2000, Hum. Comput. Interact..

[14]  Marilyn A. Walker,et al.  Automatic Detection of Poor Speech Recognition at the Dialogue Level , 1999, ACL.

[15]  Lin Lawrance Chase Error-responsive feedback mechanisms for speech recognizers , 1997 .

[16]  Clare-Marie Karat,et al.  Productivity, satisfaction, and interaction strategies of individuals with spinal cord injuries and traditional users interacting with speech recognition software , 2001, Universal Access in the Information Society.

[17]  Sharon L. Oviatt,et al.  Taming recognition errors with a multimodal interface , 2000, CACM.

[18]  Alexander H. Waibel,et al.  Multimodal error correction for speech user interfaces , 2001, TCHI.

[19]  Timothy J. Hazen,et al.  A comparison and combination of methods for OOV word detection and word confidence scoring , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[20]  Thomas Schaaf,et al.  Estimating confidence using word lattices , 1997, EUROSPEECH.

[21]  Larry Gillick,et al.  A probabilistic approach to confidence estimation and evaluation , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  Clare-Marie Karat,et al.  Conversational interface technologies , 2002 .

[23]  Lin Lawrence Chase,et al.  Word and acoustic confidence annotation for large vocabulary speech recognition , 1997, EUROSPEECH.

[24]  Bill Z. Manaris,et al.  SUITEKeys: a speech understanding interface for the motor-control challenged , 1998, Assets '98.

[25]  Sharon Oviatt,et al.  Multimodal interactive maps: designing for human performance , 1997 .

[26]  Ben Shneiderman,et al.  A comparison of voice controlled and mouse controlled web browsing , 2000, Assets '00.