A new language model adaptation framework using modification of structures of background corpus and language model

This paper presents a new framework of language model adaptation based on modification of structures of background corpus and language model. The widely used adaptation approach such as Linear Interpolation Method (LI) and Minimum Discrimination Information (MDI) method are used as the approaches to modify structure of trained background language model in new framework, while Maximum A Posteriori approach (MAP) is used as the method of modifying structure of background corpus. Experiments show that both techniques in the framework yield a significant reduction in perplexity over LI, MAP and MDI method in general adaptation framework about 5.2%, 12.2% and 36.8% respectively.

[1]  Xu Bo A Unified Language Model Adaptation Framework for Chinese Broadcast News Recognition , 2007 .

[2]  J. Darroch,et al.  Generalized Iterative Scaling for Log-Linear Models , 1972 .

[3]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[4]  Marcello Federico,et al.  Bayesian estimation methods for n-gram language model adaptation , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[5]  Jerome R. Bellegarda,et al.  Statistical language model adaptation: review and perspectives , 2004, Speech Commun..

[6]  Renato De Mori,et al.  A Cache-Based Natural Language Model for Speech Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Anthony J. Robinson,et al.  Language model adaptation using mixtures and an exponentially decaying cache , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Dietrich Klakow,et al.  Language model adaptation using dynamic marginals , 1997, EUROSPEECH.