Enhanced MAP adaptation of n-gram language models using indirect correlation of distant words

A novel and effective method to adapt n-gram language models to a new domain has been developed. We propose a heuristic method of language model adaptation using indirect correlation between words which are distant from each other, in addition to the conventional n-gram correlation, which represents only superficial and direct information of adjacent words. By adding the correlation of distant words, the adapted models come to include more information on the co-occurrence of words of a target domain and improve their performance for perplexity reduction. Furthermore, since the new correlation covers the indirect one not appearing in surface sentences, the adapted models still work well in domains somewhat different from the target domain. Experiments show that, in comparison with well-known MAP-based adaptation, the proposed method improves the performance of perplexity reduction by approximately 10% in the target domain and also in another domain.

[1]  Keikichi Hirose,et al.  Rapid adaptation of n-gram language models using inter-word correlation for speech recognition , 2000, INTERSPEECH.

[2]  Tatsuya Kawahara,et al.  Task adaptation using MAP estimation in N-gram language modeling , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Marcello Federico,et al.  Bayesian estimation methods for n-gram language model adaptation , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.