A New Method Based on Context for Combining Statistical Language Models

In this paper we propose a new method to extract from a corpus the histories for which a given language model is better than another one. The decision is based on a measure stemmed from perplexity. This measure allows, for a given history, to compare two language models, and then to choose the best one for this history. Using this principle, and with a 20K vocabulary words, we combined two language models: a bigram and a distant bigram. The contribution of a distant bigram is significant and outperforms a bigram model by 7.5%. Moreover, the performance in Shannon game are improved. We show through this article that we proposed a cheaper framework in comparison to the maximum entropy principle, for combining language models. In addition, the selected histories for which a model is better than another one, have been collected and studied. Almost, all of them are beginnings of very frequently used French phrases. Finally, by using this principle, we achieve a better trigram model in terms of parameters and perplexity. This model is a combination of a bigram and a trigram based on a selected history.

[1]  Frederick Jelinek,et al.  Self-organizing language modeling for speech recognition , 1990 .

[2]  Kamel Smaïli,et al.  Dealing with distant relationships in natural language modelling for automatic speech recognition , 2000 .

[3]  Marcello Federico,et al.  Language Model Adaptation , 1999 .

[4]  Kamel Smaïli,et al.  A new based distance language model for a dictation machine: application to MAUD , 1999, EUROSPEECH.

[5]  Jean-François Mari,et al.  Variable-length sequence language model for large vocabulary continuous dictation machine , 1999, EUROSPEECH.

[6]  Egidio P. Giachin,et al.  Phrase bigrams for continuous speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[7]  Kamel Smaïli,et al.  Automatic and manual clustering for large vocabulary speech recognition: a comparative study , 1999, EUROSPEECH.

[8]  Alex Waibel,et al.  Readings in speech recognition , 1990 .

[9]  Renato De Mori,et al.  Spoken Dialogues with Computers , 1998 .

[10]  Zhou Guodong,et al.  Interpolation of n-gram and mutual-information based trigger pair language models for Mandarin speech recognition , 1999 .

[11]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[12]  Maxine Eskénazi,et al.  BREF, a large vocabulary spoken corpus for French , 1991, EUROSPEECH.

[13]  Frederick Jelinek,et al.  Structured language modeling , 2000, Comput. Speech Lang..

[14]  Kamel Smaïli,et al.  A first evaluation campaign for language models , 1998 .

[15]  Ronald Rosenfeld,et al.  A maximum entropy approach to adaptive statistical language modelling , 1996, Comput. Speech Lang..