Improving back-off models with bag of words and hollow-grams

Classical n-grams models lack robustness on unseen events. The literature suggests several smoothing methods: empirically , the most effective of these is the modified Kneser-Ney approach. We propose to improve this back-off model: our method boils down to back-off value reordering, according to the mutual information of the words, and to a new hollow-gram model. Results show that our back-off model yields significant improvements to the baseline, based on the modified Kneser-Ney back-off. We obtain a 0.6% absolute word error rate improvement without acoustic adaptation, and 0.4% after adaptation with a 3xRT ASR system.

[1]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[2]  Ronald Rosenfeld,et al.  A maximum entropy approach to adaptive statistical language modelling , 1996, Comput. Speech Lang..

[3]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[4]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[5]  C. Uhrik,et al.  Confidence metrics based on n-gram language model backoff behaviors , 1997, EUROSPEECH.

[6]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[7]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[8]  Guillaume Gravier,et al.  The ester 2 evaluation campaign for the rich transcription of French radio broadcasts , 2009, INTERSPEECH.

[9]  Ronald Rosenfeld,et al.  Improvements in Stochastic Language Modeling , 1992, HLT.

[10]  Jeff A. Bilmes,et al.  Factored Language Models and Generalized Parallel Backoff , 2003, NAACL.

[11]  Imed Zitouni,et al.  Backoff hierarchical class n-gram language models: effectiveness to model unseen events in speech recognition , 2007, Comput. Speech Lang..

[12]  Jean-Luc Gauvain,et al.  Connectionist language modeling for large vocabulary continuous speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Georges Linarès,et al.  Probabilistic and possibilistic language models based on the world wide web , 2009, INTERSPEECH.

[14]  Ido Dagan,et al.  Similarity-Based Models of Word Cooccurrence Probabilities , 1998, Machine Learning.

[15]  Georges Linarès,et al.  The LIA Speech Recognition System: From 10xRT to 1xRT , 2007, TSD.