论文信息 - Improving back-off models with bag of words and hollow-grams

Improving back-off models with bag of words and hollow-grams

Classical n-grams models lack robustness on unseen events. The literature suggests several smoothing methods: empirically , the most effective of these is the modified Kneser-Ney approach. We propose to improve this back-off model: our method boils down to back-off value reordering, according to the mutual information of the words, and to a new hollow-gram model. Results show that our back-off model yields significant improvements to the baseline, based on the modified Kneser-Ney back-off. We obtain a 0.6% absolute word error rate improvement without acoustic adaptation, and 0.4% after adaptation with a 3xRT ASR system.

Georges Linarès | Benjamin Lecouteux | Raphaël Rubino

[1] Ted Dunning,et al. Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[2] Ronald Rosenfeld,et al. A maximum entropy approach to adaptive statistical language modelling , 1996, Comput. Speech Lang..

[3] Hermann Ney,et al. Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[4] Slava M. Katz,et al. Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[5] C. Uhrik,et al. Confidence metrics based on n-gram language model backoff behaviors , 1997, EUROSPEECH.

[6] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[7] Robert L. Mercer,et al. Class-Based n-gram Models of Natural Language , 1992, CL.

[8] Guillaume Gravier,et al. The ester 2 evaluation campaign for the rich transcription of French radio broadcasts , 2009, INTERSPEECH.

[9] Ronald Rosenfeld,et al. Improvements in Stochastic Language Modeling , 1992, HLT.

[10] Jeff A. Bilmes,et al. Factored Language Models and Generalized Parallel Backoff , 2003, NAACL.

[11] Imed Zitouni,et al. Backoff hierarchical class n-gram language models: effectiveness to model unseen events in speech recognition , 2007, Comput. Speech Lang..

[12] Jean-Luc Gauvain,et al. Connectionist language modeling for large vocabulary continuous speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13] Georges Linarès,et al. Probabilistic and possibilistic language models based on the world wide web , 2009, INTERSPEECH.

[14] Ido Dagan,et al. Similarity-Based Models of Word Cooccurrence Probabilities , 1998, Machine Learning.

[15] Georges Linarès,et al. The LIA Speech Recognition System: From 10xRT to 1xRT , 2007, TSD.