Automatic Speech Recognition Errors Detection and Correction: A Review

Abstract Even though Automatic Speech Recognition (ASR) has matured to the point of commercial applications, high error rate in some speech recognition domains remain as one of the main impediment factors to the wide adoption of speech technology, and especially for continuous large vocabulary speech recognition applications. The persistent presence of ASR errors have intensified the need to find alternative techniques to automatically detect and correct such errors. The correction of the transcription errors is very crucial not only to improve the speech recognition accuracy, but also to avoid the propagation of the errors to the subsequent language processing modules such as machine translation. In this paper, basic principles of ASR evaluation are first summarized, and then the state of the current ASR errors detection and correction research is reviewed. We focus on emerging techniques using word error rate metric.

[1]  William A. Ainsworth,et al.  Feedback Strategies for Error Correction in Speech Recognition Systems , 1992, Int. J. Man Mach. Stud..

[2]  Gerald Penn,et al.  Automatic human utility evaluation of ASR systems: does WER really predict performance? , 2013, INTERSPEECH.

[3]  Hui Jiang,et al.  Confidence measures for speech recognition: A survey , 2005, Speech Commun..

[4]  Lina Zhou,et al.  Supporting dictation speech recognition error correction: the impact of external information , 2011, Behav. Inf. Technol..

[5]  Alexandre Allauzen Error detection in confusion network , 2007, INTERSPEECH.

[6]  Dylan M. Jones,et al.  Data-entry by voice: facilitating correction of misrecognitions , 1993 .

[7]  Peter N. Yianilos,et al.  Learning String-Edit Distance , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Youssef Bassil,et al.  ASR Context-Sensitive Error Correction Based on Microsoft N-Gram Dataset , 2012, ArXiv.

[9]  Wei Chen,et al.  ASR error detection in a conversational spoken language translation system , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Andrew Sears,et al.  Data mining for detecting errors in dictation speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.

[11]  Isabel Trancoso,et al.  Error Detection in Broadcast News ASR Using Markov Chains , 2009, LTC.

[12]  David D. Palmer,et al.  Context-based Speech Recognition Error Detection and Correction , 2004, NAACL.

[13]  Tatsuya Kawahara,et al.  A new ASR evaluation measure and minimum Bayes-risk decoding for open-domain speech understanding , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[14]  Papoulis A. Probability, random variables, and stochastic processes. New York: McGraw Hill, 1965 , 2004 .

[15]  Steve Renals,et al.  Convolutional Neural Networks for Distant Speech Recognition , 2014, IEEE Signal Processing Letters.

[16]  Navdeep Jaitly,et al.  Application of Pretrained Deep Neural Networks to Large Vocabulary Speech Recognition , 2012, INTERSPEECH.

[17]  Hervé Bourlard,et al.  On the Use of Information Retrieval Measures for Speech Recognition Evaluation , 2004 .

[18]  Andrew Sears,et al.  Using confidence scores to improve hands-free speech based navigation in continuous dictation systems , 2004, TCHI.

[19]  Phil D. Green,et al.  From WER and RIL to MER and WIL: improved evaluation measures for connected speech recognition , 2004, INTERSPEECH.

[20]  Mei-Yuh Hwang,et al.  Unsupervised learning from users' error correction in speech dictation , 2004, INTERSPEECH.

[21]  Alexander H. Waibel,et al.  Multimodal error correction for speech user interfaces , 2001, TCHI.

[22]  Isabel Trancoso,et al.  Improving ASR error detection with non-decoder based features , 2010, INTERSPEECH.