Efficient language model adaptation through MDI estimation

This paper presents a method for n-gram language model adaptation based on the principle of minimum discrimination information. A background language model is adapted to fit constraints on its marginal distributions that are derived from new observed data. This work gives a different derivation of the model by Kneser et al. (1997) and extends its application to interpolated language models. The proposed method has been evaluated on an Italian 60K-word broadcast news task.

[1]  Salim Roukos,et al.  Language model adaptation via minimum discrimination information , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[2]  Ronald Rosenfeld,et al.  Adaptive Statistical Language Modeling; A Maximum Entropy Approach , 1994 .

[3]  Salim Roukos,et al.  MDI adaptation of language models across corpora , 1997, EUROSPEECH.

[4]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[5]  Dietrich Klakow,et al.  Language model adaptation using dynamic marginals , 1997, EUROSPEECH.

[6]  Hermann Ney,et al.  On structuring probabilistic dependences in stochastic language modelling , 1994, Comput. Speech Lang..

[7]  J. Darroch,et al.  Generalized Iterative Scaling for Log-Linear Models , 1972 .

[8]  Ronald Rosenfeld,et al.  Trigger-based language models: a maximum entropy approach , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Robert L. Mercer,et al.  Adaptive Language Modeling Using Minimum Discriminant Estimation , 1992, HLT.

[10]  Fabio Brugnara,et al.  Dynamic language models for interactive speech applications , 1997, EUROSPEECH.

[11]  Marcello Federico,et al.  Bayesian estimation methods for n-gram language model adaptation , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[12]  Thomas M. Cover,et al.  Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) , 2006 .

[13]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..