论文信息 - Mixture of mixture n-gram language models

Mixture of mixture n-gram language models

This paper presents a language model adaptation technique to build a single static language model from a set of language models each trained on a separate text corpus while aiming to maximize the likelihood of an adaptation data set given as a development set of sentences. The proposed model can be considered as a mixture of mixture language models. The mixture model at the top level is a sentence-level mixture model where each sentence is assumed to be drawn from one of a discrete set of topic or task clusters. After selecting a cluster, each n-gram is assumed to be drawn from one of the given n-gram language models. We estimate cluster mixture weights and n-gram language model mixture weights for each cluster using the expectation-maximization (EM) algorithm to seek the parameter estimates maximizing the likelihood of the development sentences. This mixture of mixture models can be represented efficiently as a static n-gram language model using the previously proposed Bayesian language model interpolation technique. We show a significant improvement with this technique (both perplexity and WER) compared to the standard one level interpolation scheme.

[1] Cyril Allauzen,et al. Bayesian Language Model Interpolation for Mobile Speech Input , 2011, INTERSPEECH.

[2] Navdeep Jaitly,et al. Application of Pretrained Deep Neural Networks to Large Vocabulary Speech Recognition , 2012, INTERSPEECH.

[3] Johan Schalkwyk,et al. On-demand language model interpolation for mobile speech input , 2010, INTERSPEECH.

[4] Jerome R. Bellegarda,et al. Statistical language model adaptation: review and perspectives , 2004, Speech Commun..

[5] Andreas Stolcke,et al. Entropy-based Pruning of Backoff Language Models , 2000, ArXiv.

[6] Karthik Visweswariah,et al. Language models conditioned on dialog state , 2001, INTERSPEECH.

[7] Wei Xu,et al. Language modeling for dialog system , 2000, INTERSPEECH.

[8] Mari Ostendorf,et al. Modeling long distance dependence in language: topic mixtures versus dynamic cache models , 1996, IEEE Trans. Speech Audio Process..

[9] Hermann Ney,et al. A COMPARISON OF DIALOGUE-STATE DEPENDENT LANGUAGE MODELS , 2007 .

[10] Cyril Allauzen,et al. Language model verbalization for automatic speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11] Reinhard Kneser,et al. On the dynamic adaptation of stochastic language models , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.