Estimating entropy of a language from optimal word insertion penalty

The relationship between the optimal value of word insertion penalty and entropy of the language is discussed, based on the hypothesis that the optimal word insertion penalty compensates the probability given by a la nguage model to the true probability. It is shown that the optimal word insertion penalty can be calculated as the difference between test set entropy of the given language model and true entropy of the given test set sentences. The correctness of the idea is confirmed through recogn ition experiment, where the entropy of the given set of sentences are estimated from two different language models and word insertion penalty optimized for each language model.

[1]  L. R. Bahl Language-model/acoustic channel balance mechanism , 1980 .

[2]  Jesús E. Díaz-Verdejo,et al.  On the influence of frame-asynchronous grammar scoring in a CSR system , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  F. Jelinek,et al.  Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[4]  A. Ogawa,et al.  Language modeling for robust balancing of acoustic and linguistic probabilities , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.