DFKI’s experimental hybrid MT system for WMT 2015

DFKI participated in the shared translation task of WMT 2015 with the GermanEnglish language pair in each translation direction. The submissions were generated using an experimental hybrid system based on three systems: a statistical Moses system, a commercial rule-based system, and a serial coupling of the two where the output of the rule-based system is further translated by Moses trained on parallel text consisting of the rule-based output and the original target language. The outputs of three systems are combined using two methods: (a) an empirical selection mechanism based on grammatical features (primary submission) and (b) IBM1 models based on POS 4-grams (contrastive submission).

[1]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[2]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[3]  Christian Federmann,et al.  Can Machine Learning Algorithms Improve Phrase Selection in Hybrid Machine Translation? , 2012, ESIRMT/HyTra@EACL.

[4]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[5]  Philipp Koehn,et al.  Dirt Cheap Web-Scale Parallel Text from the Common Crawl , 2013, ACL.

[6]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[7]  Lucia Specia,et al.  QuEst – Design, Implementation and Extensions of a Framework for Machine Translation Quality Estimation , 2013, Prague Bull. Math. Linguistics.

[8]  Christian Federmann,et al.  Stochastic Parse Tree Selection for an Existing RBMT System , 2011, WMT@EMNLP.

[9]  Eleftherios Avramidis,et al.  Evaluate with Confidence Estimation: Machine ranking of translation outputs using grammatical features , 2011, WMT@EMNLP.

[10]  Rico Sennrich,et al.  Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics , 2012 .

[11]  Philipp Koehn,et al.  Findings of the 2013 Workshop on Statistical Machine Translation , 2013, WMT@ACL.

[12]  Eleftherios Avramidis,et al.  Machine learning methods for comparative and time-oriented Quality Estimation of Machine Translation output , 2013 .

[13]  D. Basak,et al.  Support Vector Regression , 2008 .

[14]  Chen Yu,et al.  Machine Learning for Hybrid Machine Translation , 2012, WMT@NAACL-HLT.

[15]  Avramidis Eleftherios,et al.  Qualitative: Open Source Python Tool for Quality Estimation over Multiple Machine Translation Outputs , 2014, Prague Bull. Math. Linguistics.

[16]  Michel Simard,et al.  Statistical Phrase-Based Post-Editing , 2007, NAACL.

[17]  Hans Uszkoreit,et al.  Further Experiments with Shallow Hybrid MT Systems , 2010, WMT@ACL.

[18]  Hermann Ney,et al.  POS-based Word Reorderings for Statistical Machine Translation , 2006, LREC.

[19]  Sven Schmeier,et al.  Qualitative: Open source Python tool for Quality Estimation over multiple Machine Translation outputs , 2014, Prague Bull. Math. Linguistics.

[20]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[21]  Philipp Koehn,et al.  Findings of the 2014 Workshop on Statistical Machine Translation , 2014, WMT@ACL.

[22]  Eleftherios Avramidis,et al.  Evaluation without references: IBM1 scores as evaluation metrics , 2011, WMT@EMNLP.

[23]  Lucia Specia,et al.  QuEst - A translation quality estimation framework , 2013, ACL.

[24]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[25]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[26]  Maja Popovic Morphemes and POS tags for n-gram based evaluation metrics , 2011, WMT@EMNLP.

[27]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[28]  Eleftherios Avramidis,et al.  Sentence-level ranking with quality estimation , 2013, Machine Translation.

[29]  Andreas Eisele,et al.  MultiUN: A Multilingual Corpus from United Nation Documents , 2010, LREC.

[30]  Andreas Eisele,et al.  Multi-Engine Machine Translation with an Open-Source SMT Decoder , 2007, WMT@ACL.