论文信息 - Evaluation Metrics For Language Models

Evaluation Metrics For Language Models

The most widely-used evaluation metric for language models for speech recognition is the perplexity of test data. While perplexities can be calculated efficiently and without access to a speech recognizer, they often do not correlate well with speech recognition word-error rates. In this research, we attempt to find a measure that like perplexity is easily calculated but which better predicts speech recognition performance. We investigate two approaches; first, we attempt to extend perplexity by using similar measures that utilize information about language models that perplexity ignores. Second, we attempt to imitate the word-error calculation without using a speech recognizer by artificially generating speech recognition lattices. To test our new metrics, we have built over thirty varied language models. We find that perplexity correlates with word-error rate remarkably well when only considering n-gram models trained on in-domain data. When considering other types of models, our novel metrics are superior to perplexity for predicting speech recognition performance. However, we conclude that none of these measures predict word-error rate sufficiently accurately to be effective tools for language model evaluation in speech recognition.

[1] Richard M. Stern,et al. The 1996 Hub-4 Sphinx-3 System , 1997 .

[2] John Lafferty,et al. A Model of Lexical Attraction and Repulsion , 1997, Annual Meeting of the Association for Computational Linguistics.

[3] Hermann Ney,et al. Adaptive topic - dependent language modelling using word - based varigrams , 1997, EUROSPEECH.

[4] Alex Waibel,et al. The Janus Speech Recognizer , 1995 .

[5] Hermann Ney,et al. On structuring probabilistic dependences in stochastic language modelling , 1994, Comput. Speech Lang..

[6] Slava M. Katz,et al. Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[7] Mari Ostendorf,et al. Analyzing and predicting language model improvements , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[8] Renato De Mori,et al. A Cache-Based Natural Language Model for Speech Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[9] Robert L. Mercer,et al. Class-Based n-gram Models of Natural Language , 1992, CL.

[10] Hermann Ney,et al. Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.