论文信息 - Factored Language Models for Statistical Machine Translation

Factored Language Models for Statistical Machine Translation

Machine translation systems, as a whole, are currently not able to use the output of linguistic tools, such as part-of-speech taggers, to effectively improve translation performance. However, a new language modeling technique, Factored Language Models can incorporate the additional linguistic information that is produced by these tools. In the field of automatic speech recognition, Factored Language Models smoothed with Generalized Parallel Backoff have been shown to significantly reduce language model perplexity. However, Factored Language Models have previously only been applied to statistical machine translation as part of a second-pass rescoring system. In this thesis, we show that a state-of-the-art phrase-based system using factored language models with generalized parallel backoff can improve performance over an identical system using trigram language models. These improvements can be seen both with the use of additional word features and without. The relative gain from the Factored Language Models increases with smaller training corpora, making this approach especially useful for domains with limited data.

Amittai Axelrod | Amittai Axelrod

[1] Hinrich Schütze,et al. Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[2] Robert L. Mercer,et al. The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[3] Sang Joon Kim,et al. A Mathematical Theory of Communication , 2006 .

[4] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[5] Alexander H. Waibel,et al. Effective Phrase Translation Extraction from Alignment Models , 2003, ACL.

[6] F. Jelinek,et al. Perplexity—a measure of the difficulty of speech recognition tasks , 1977 .

[7] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[8] David Chiang,et al. A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[9] Sergei Nirenburg,et al. A Statistical Approach to Machine Translation , 2003 .

[10] John Cocke,et al. A Statistical Approach to Machine Translation , 1990, CL.

[11] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.