Rapid adaptation of n-gram language models using inter-word correlation for speech recognition

In this paper, we study the fast adaptation problem of n-gram language model under the MAP estimation framework. We have proposed a heuristic method to explore inter-word correlation to accelerate MAP adaptation of n-gram model. According to their correlations, the occurrence of one word can be used to predict all other words in adaptation text. In this way, a large n-gram model can be efficiently adapted with a small amount of adaptation data. The proposed fast adaptation approach is evaluated in a Japanese newspaper corpus. We have observed a significant perplexity reduction even when we have only several hundred adaptation sentences.

[1]  Tatsuya Kawahara,et al.  Topic independent language model for key-phrase detection and verification , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[2]  Marcello Federico,et al.  Bayesian estimation methods for n-gram language model adaptation , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[3]  Ronald Rosenfeld,et al.  Statistical language modeling using the CMU-cambridge toolkit , 1997, EUROSPEECH.