论文信息 - Improving Statistical MT through Morphological Analysis

Improving Statistical MT through Morphological Analysis

In statistical machine translation, estimating word-to-word alignment probabilities for the translation model can be difficult due to the problem of sparse data: most words in a given corpus occur at most a handful of times. With a highly inflected language such as Czech, this problem can be particularly severe. In addition, much of the morphological variation seen in Czech words is not reflected in either the morphology or syntax of a language like English. In this work, we show that using morphological analysis to modify the Czech input can improve a Czech-English machine translation system. We investigate several different methods of incorporating morphological information, and show that a system that combines these methods yields the best results. Our final system achieves a BLEU score of .333, as compared to .270 for the baseline word-to-word system.

Sharon Goldwater | David McClosky | David McClosky | S. Goldwater

[1] Hermann Ney,et al. Statistical Machine Translation with Scarce Resources Using Morpho-syntactic Information , 2004, CL.

[2] David Yarowsky,et al. Statistical Machine Translation: Final Report , 1999 .

[3] Young-Suk Lee,et al. Morphological Analysis for Statistical Machine Translation , 2004, NAACL.

[4] Jan Hajic,et al. Prague Czech-English Dependency Treebank. Syntactically Annotated Resources for Machine Translation , 2004, LREC.

[5] Martin Cmejrek,et al. Czech-English dependency-based machine translation , 2003 .

[6] Hermann Ney,et al. A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[7] Ronald Rosenfeld,et al. Statistical language modeling using the CMU-cambridge toolkit , 1997, EUROSPEECH.

[8] Hermann Ney,et al. Improving SMT quality with morpho-syntactic analysis , 2000, COLING.