Vs and OOVs: Two Problems for Translation between German and English

In this paper we report on experiments with three preprocessing strategies for improving translation output in a statistical MT system. In training, two reordering strategies were studied: (i) reorder on the basis of the alignments from Giza++, and (ii) reorder by moving all verbs to the end of segments. In translation, out-of-vocabulary words were preprocessed in a knowledge-lite fashion to identify a likely equivalent. All three strategies were implemented for our English↔German system submitted to the WMT10 shared task. Combining them lead to improvements in both language directions.

[1]  Philipp Koehn,et al.  Empirical Methods for Compound Splitting , 2003, EACL.

[2]  Helmut Schmid,et al.  Estimation of Conditional Probabilities With Decision Trees and an Application to Fine-Grained POS Tagging , 2008, COLING.

[3]  Stefan Riezler,et al.  On Some Pitfalls in Automatic Evaluation and Significance Testing for MT , 2005, IEEvaluation@ACL.

[4]  Shankar Kumar,et al.  Minimum Bayes-Risk Decoding for Statistical Machine Translation , 2004, NAACL.

[5]  Chao Wang,et al.  Chinese Syntactic Reordering for Statistical Machine Translation , 2007, EMNLP.

[6]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[7]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[8]  Sara Stymne,et al.  German Compounds in Factored Statistical Machine Translation , 2008, GoTAL.

[9]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[10]  Philipp Koehn,et al.  Towards better Machine Translation Quality for the German-English Language Pairs , 2008, WMT@ACL.

[11]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[12]  Sara Stymne,et al.  Improving Alignment for SMT by Reordering and Augmenting the Training Corpus , 2009, WMT@EACL.

[13]  Alon Lavie,et al.  Meteor, M-BLEU and M-TER: Evaluation Metrics for High-Correlation with Human Rankings of Machine Translation Output , 2008, WMT@ACL.

[14]  Lucia Specia,et al.  Source-Language Entailment Modeling for Translating Unknown Terms , 2009, ACL.

[15]  Sara Stymne,et al.  Processing of Swedish compounds for phrase-based statistical machine translation , 2008, EAMT.

[16]  Fei Xia,et al.  Improving a Statistical MT System with Automatically Learned Rewrite Patterns , 2004, COLING.

[17]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[18]  Eiichiro Sumita,et al.  Translation of unknown words in phrase-based statistical machine translation for languages of rich morphology , 2008, SLTU.

[19]  Stefan Langer,et al.  Zur Morphologie und Semantik von Nominalkomposita , 1998 .

[20]  Philipp Koehn,et al.  Experiments in Domain Adaptation for Statistical Machine Translation , 2007, WMT@ACL.