论文信息 - Towards the Use of Word Stems and Suffixes for Statistical Machine Translation

Towards the Use of Word Stems and Suffixes for Statistical Machine Translation

In this paper we present methods for improving the quality of translation from an inflected language into English by making use of part-of-speech tags and word stems and suffixes in the source language. Results for translations from Spanish and Catalan into English are presented on the LC-STAR trilingual corpus which consists of spontaneously spoken dialogues in the domain of travelling and appointment scheduling. Results for translation from Serbian into English are presented on the Assimil language course, the bilingual corpus from unrestricted domain. We achieve up to 5% relative reduction of error rates for Spanish and Catalan and about 8% for Serbian.

Hermann Ney | Maja Popovic | H. Ney | Maja Popovic

[1] Robert L. Mercer,et al. The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[2] Hermann Ney,et al. Statistical Methods for Machine Translation , 2000 .

[3] Wolfgang Wahlster,et al. Verbmobil: Foundations of Speech-to-Speech Translation , 2000, Artificial Intelligence.

[4] John A. Goldsmith,et al. Unsupervised Learning of the Morphology of a Natural Language , 2001, CL.

[5] Hermann Ney,et al. Toward hierarchical models for statistical machine translation of inflected languages , 2001, DDMMT@ACL.

[6] Hermann Ney,et al. Statistical multi-source translation , 2001, MTSUMMIT.

[7] Mathias Creutz,et al. Unsupervised Discovery of Morphemes , 2002, SIGMORPHON.

[8] Hermann Ney,et al. Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[9] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[10] Philipp Koehn,et al. Empirical Methods for Compound Splitting , 2003, EACL.