Is word error rate a good indicator for spoken language understanding accuracy

It is a conventional wisdom in the speech community that better speech recognition accuracy is a good indicator for better spoken language understanding accuracy, given a fixed understanding component. The findings in this work reveal that this is not always the case. More important than word error rate reduction, the language model for recognition should be trained to match the optimization objective for understanding. In this work, we applied a spoken language understanding model as the language model in speech recognition. The model was obtained with an example-based learning algorithm that optimized the understanding accuracy. Although the speech recognition word error rate is 46% higher than the trigram model, the overall slot understanding error can be reduced by as much as 17%.

[1]  Alex Acero,et al.  Combination of CFG and n-gram modeling in semantic grammar learning , 2003, INTERSPEECH.

[2]  Steve Young,et al.  The HTK hidden Markov model toolkit: design and philosophy , 1993 .

[3]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[4]  Steve J. Young,et al.  Talking to machines (statistically speaking) , 2002, INTERSPEECH.

[5]  Frédéric Béchet,et al.  Conceptual decoding for spoken dialog systems , 2003, INTERSPEECH.

[6]  Helen Meng,et al.  Improvements on a semi-automatic grammar induction framework , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[7]  李幼升,et al.  Ph , 1989 .

[8]  Alexander H. Waibel,et al.  Growing Semantic Grammars , 1998, COLING-ACL.

[9]  Andreas Stolcke,et al.  Best-first Model Merging for Hidden Markov Model Induction , 1994, ArXiv.

[10]  Alex Acero,et al.  Concept acquisition in example-based grammar authoring , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[11]  Giuseppe Riccardi,et al.  Stochastic language models for speech recognition and understanding , 1998, ICSLP.

[12]  Richard M. Schwartz,et al.  Hidden Understanding Models of Natural Language , 1994, ACL.

[13]  Roberto Pieraccini,et al.  Stochastic automata for language modeling , 1996, Comput. Speech Lang..

[14]  P. J. Price,et al.  Evaluation of Spoken Language Systems: the ATIS Domain , 1990, HLT.

[15]  Xuedong Huang,et al.  A unified context-free grammar and n-gram model for spoken language processing , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).