Getting it right the second time: Recognition of spoken corrections

We investigate ways to improve recognition accuracy on spoken corrections. We show that a variety of simple techniques can greatly improve the accuracy on corrections. We further develop a flexible merge model that improves accuracy by combining information from the original recognition and the spoken correction. Our merge model operates on word confusion networks and can easily incorporate prior beliefs about the recognition events (e.g. which words are likely correct or incorrect). By combining all of our techniques, the percentage of correctly recognized spoken corrections increased from 21% to 53%.

[1]  Alexander H. Waibel,et al.  Exploiting repair context in interactive error recovery , 1997, EUROSPEECH.

[2]  Keith Vertanen,et al.  Speech and speech recognition during dictation corrections , 2006, INTERSPEECH.

[3]  Andreas Stolcke,et al.  Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[4]  Sharon L. Oviatt,et al.  Error resolution during multimodal human-computer interaction , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[5]  Per Ola Kristensson,et al.  Automatic selection of recognition errors by respeaking the intended text , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[6]  Gregory T Yu,et al.  Efficient error correction for speech systems using constrained re-recognition , 2008 .

[7]  Alexander H. Waibel,et al.  Improving recognizer acceptance through robust, natural speech repair , 1994, ICSLP.

[8]  Steve Young,et al.  Token passing: a simple conceptual model for connected speech recognition systems , 1989 .

[9]  Gunnar Evermann,et al.  Posterior probability decoding, confidence estimation and system combination , 2000 .

[10]  Geoffrey Zweig New methods for the analysis of repeated utterances , 2009, INTERSPEECH.

[11]  Per Ola Kristensson,et al.  Parakeet: a continuous speech recognition system for mobile touch-screen devices , 2009, IUI.

[12]  Horacio Franco,et al.  MUESLI: multiple utterance error correction for a spoken language interface , 2008, INTERSPEECH.

[13]  Clare-Marie Karat,et al.  The Beauty of Errors: Patterns of Error Correction in Desktop Speech Systems , 1999, INTERACT.