Methods for combining language models in speech recognition

Statistical language models have a vital part in contemporary speech recognition systems and a lot of language models have been presented in the literature. The best results have been achieved when different language models have been used together. Several combination methods have been presented, but few comparisons of the different methods has been done. In this work, three combination methods that have been used with language models are studied. In addition, a new approach based on likelihood density function estimation using histograms is presented. The methods are evaluated in speech recognition experiments and perplexity calculations. The test data consist of Finnish news articles and four language models work as the component models. In the perplexity experiments, all combining methods produced statistically significant improvement compared to the 4gram model that worked as a baseline. The best result, 46 % improvement to the 4-gram model, was achieved when combining three language models together by using the new bin estimation method. In the speech recognition experiments, 4 % reduction to the word error and over 7 % reduction to the phoneme error was achieved by unigram rescaling method.

[1]  J.R. Bellegarda,et al.  Exploiting latent semantic information in statistical language modeling , 2000, Proceedings of the IEEE.

[2]  Mathias Creutz,et al.  Unsupervised Discovery of Morphemes , 2002, SIGMORPHON.

[3]  Daniel Jurafsky,et al.  Towards better integration of semantic predictors in statistical language modeling , 1998, ICSLP.

[4]  Renato De Mori,et al.  A Cache-Based Natural Language Model for Speech Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  M. Kurimo,et al.  Decoder issues in unlimited finnish speech recognition , 2004, Proceedings of the 6th Nordic Signal Processing Symposium, 2004. NORSIG 2004..

[6]  Thomas Hofmann,et al.  Topic-based language models using EM , 1999, EUROSPEECH.