An event-related brain potential study on the impact of speech recognition errors

Most automatic speech recognition (ASR) systems, which aim for perfect transcription of utterances, are trained and tuned by minimizing the word error rate (WER). In this framework, even though the impact of all errors is not the same, all errors (substitutions, deletions, insertions) from any words are treated in a uniform manner. The size of the impact and exactly what the differences are remain unknown. Several studies have proposed possible alternatives to the WER metric. But no analysis has investigated how the human brain processes language and perceives the effect of mistaken output by ASR systems. In this research we utilize event-related brain potential (ERP) studies and directly analyze the brain activities on the impact of ASR errors. Our results reveal that the peak amplitudes of the positive shift after the substitution and deletion violations are much bigger than the insertion violations. This finding indicates that humans perceived each error differently based on its impact of the whole sentence. To investigate the effect of this study, we formulated a new weighted word error rate metric based on the ERP results: ERP-WWER. We re-evaluated the ASR performance using the new ERP-WWER metric and compared and discussed the results with the standard WER.

[1]  Tomoki Toda,et al.  Towards Multilingual Conversations in the Medical Domain: Development of Multilingual Medical Data and A Network-based ASR System , 2014, LREC.

[2]  M. Kutas,et al.  Brain potentials during reading reflect word expectancy and semantic association , 1984, Nature.

[3]  Lee Osterhout,et al.  On the Brain Response to Syntactic Anomalies: Manipulations of Word Position and Word Class Reveal Individual Differences , 1997, Brain and Language.

[4]  Tatsuya Kawahara,et al.  A new ASR evaluation measure and minimum Bayes-risk decoding for open-domain speech understanding , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[5]  Phil D. Green,et al.  From WER and RIL to MER and WIL: improved evaluation measures for connected speech recognition , 2004, INTERSPEECH.

[6]  A. Friederici,et al.  Semantic and syntactic processing in Chinese sentence comprehension: Evidence from event-related potentials , 2006, Brain Research.

[7]  Stefanie Regel,et al.  The comprehension of figurative language: electrophysiological evidence on the processing of irony , 2009 .

[8]  Eiichiro Sumita,et al.  Comparative study on corpora for speech translation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Anders M. Dale,et al.  Electrical and magnetic readings of mental functions , 1997 .

[10]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[11]  P. Hagoort Interplay between Syntax and Semantics during Sentence Comprehension: ERP Effects of Combining Syntactic and Semantic Violations , 2003, Journal of Cognitive Neuroscience.

[12]  Kazuyuki Nakagome,et al.  Early Components of Event-Related Potentials Related to Semantic and Syntactic Processes in the Japanese Language , 2004, Brain Topography.

[13]  Andrej Ljolje,et al.  Predicting Human Perceived Accuracy of ASR Systems , 2011, INTERSPEECH.

[14]  Giuseppe Riccardi,et al.  Stochastic language models for speech recognition and understanding , 1998, ICSLP.

[15]  M. Kutas,et al.  Event-related brain potentials to semantically inappropriate and surprisingly large words , 1980, Biological Psychology.

[16]  Hitoshi Isahara,et al.  Spontaneous Speech Corpus of Japanese , 2000, LREC.