论文信息 - Multi-class Model M

Multi-class Model M

Model M, a novel class-based exponential language model, has been shown to significantly outperform word n-gram models in state-of-the-art machine translation and speech recognition systems. The model was motivated by the observation that shrinking the sum of the parameter magnitudes in an exponential language model leads to better performance on unseen data. Being a class-based language model, Model M makes use of word classes that are found automatically from training data. In this paper, we extend Model M to allow for different clusterings to be used at different word positions. This is motivated by the fact that words play different roles depending on their position in an n-gram. Experiments on standard NIST and GALE Arabic-to-English development and test sets show improvements in machine translation quality as measured by automatic evaluation metrics.

Ahmad Emami | Stanley F. Chen | Stanley F. Chen | Ahmad Emami

[1] Hagen Soltau,et al. Decoding with shrinkage-based language models , 2010, INTERSPEECH.

[2] Hermann Ney,et al. Algorithms for bigram and trigram word clustering , 1995, Speech Commun..

[3] Stanley F. Chen,et al. Enhanced word classing for model M , 2010, INTERSPEECH.

[4] Ahmad Emami,et al. Random clusterings for language modeling , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[5] David Chiang,et al. Hierarchical Phrase-Based Translation , 2007, CL.

[6] Hermann Ney,et al. Improved clustering techniques for class-based statistical language modelling , 1993, EUROSPEECH.

[7] Robert L. Mercer,et al. Class-Based n-gram Models of Natural Language , 1992, CL.

[8] Joshua Goodman,et al. A bit of progress in language modeling , 2001, Comput. Speech Lang..

[9] Stanley F. Chen,et al. Shrinking Exponential Language Models , 2009, NAACL.

[10] Yaser Al-Onaizan,et al. Generalizing Local and Non-Local Word-Reordering Patterns for Syntax-Based Machine Translation , 2008, EMNLP.

[11] Stanley F. Chen,et al. Performance Prediction for Exponential Language Models , 2009, NAACL.