论文信息 - Improved Language Modeling for Statistical Machine Translation

Improved Language Modeling for Statistical Machine Translation

Statistical machine translation systems use a combination of one or more translation models and a language model. While there is a significant body of research addressing the improvement of translation models, the problem of optimizing language models for a specific translation task has not received much attention. Typically, standard word trigram models are used as an out-of-the-box component in a statistical machine translation system. In this paper we apply language modeling techniques that have proved beneficial in automatic speech recognition to the ACL05 machine translation shared data task and demonstrate improvements over a baseline system with a standard language model.

Mei Yang | Katrin Kirchhoff | Mei Yang | K. Kirchhoff

[1] Martin F. Porter,et al. An algorithm for suffix stripping , 1997, Program.

[2] Peter Beyerlein,et al. Discriminative model combination , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[3] Philipp Koehn,et al. Pharaoh: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models , 2004, AMTA.

[4] Andreas Stolcke,et al. Morphology-based language modeling for arabic speech recognition , 2004, INTERSPEECH.

[5] Kevin Duh,et al. Automatic Learning of Language Model Structure , 2004, COLING.

[6] John A. Nelder,et al. A Simplex Method for Function Minimization , 1965, Comput. J..

[7] Adwait Ratnaparkhi,et al. A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[8] Jeff A. Bilmes,et al. Factored Language Models and Generalized Parallel Backoff , 2003, NAACL.

[9] Christopher D. Manning,et al. Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.