论文信息 - On-line Language Model Biasing for Statistical Machine Translation

On-line Language Model Biasing for Statistical Machine Translation

The language model (LM) is a critical component in most statistical machine translation (SMT) systems, serving to establish a probability distribution over the hypothesis space. Most SMT systems use a static LM, independent of the source language input. While previous work has shown that adapting LMs based on the input improves SMT performance, none of the techniques has thus far been shown to be feasible for on-line systems. In this paper, we develop a novel measure of cross-lingual similarity for biasing the LM based on the test input. We also illustrate an efficient on-line implementation that supports integration with on-line SMT systems by transferring much of the computational load off-line. Our approach yields significant reductions in target perplexity compared to the static LM, as well as consistent improvements in SMT performance across language pairs (English-Dari and English-Pashto).

[1] Ralph Weischedel,et al. A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[2] Richard M. Schwartz,et al. Language and Translation Model Adaptation using Comparable Corpora , 2008, EMNLP.

[3] David Yarowsky,et al. Statistical Machine Translation: Final Report , 1999 .

[4] Stephan Vogel,et al. Language Model Adaptation for Statistical Machine Translation via Structured Query Models , 2004, COLING.

[5] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[6] Sanjeev Khudanpur,et al. Language model adaptation for automatic speech recognition and statistical machine translation , 2005 .

[7] Daniel Marcu,et al. Statistical Phrase-Based Translation , 2003, NAACL.

[8] Robert L. Mercer,et al. The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[9] Franz Josef Och,et al. Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[10] Matthew G. Snover,et al. A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[11] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.