The TALP-UPC Phrase-Based Translation Systems for WMT13: System Combination with Morphology Generation, Domain Adaptation and Corpus Filtering

This paper describes the TALP participation in the WMT13 evaluation campaign. Our participation is based on the combination of several statistical machine translation systems: based on standard phrasebased Moses systems. Variations include techniques such as morphology generation, training sentence filtering, and domain adaptation through unit derivation. The results show a coherent improvement on TER, METEOR, NIST, and BLEU scores when compared to our baseline system.

[1]  Lluís Formiga Fanals,et al.  Real-life translation quality estimation for MT system selection , 2013 .

[2]  Roland Kuhn,et al.  Phrasetable Smoothing for Statistical Machine Translation , 2006, EMNLP.

[3]  Alon Lavie,et al.  Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems , 2011, WMT@EMNLP.

[4]  Lluís Màrquez i Villodre,et al.  Linguistic measures for automatic machine translation evaluation , 2010, Machine Translation.

[5]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[6]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[7]  Radu Soricut,et al.  TrustRank: Inducing Trust in Automatic Translations via Ranking , 2010, ACL.

[8]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[9]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[10]  Lluís Formiga Fanals,et al.  Improving English to Spanish out-of-domain translations by morphology generalization and generation , 2012 .

[11]  Rafael E. Banchs,et al.  Deriving translation units using small additional corpora , 2011, EAMT.

[12]  Alberto Barrón-Cedeño,et al.  Identifying Useful Human Correction Feedback from an On-Line Machine Translation Service , 2013, IJCAI.

[13]  Lucia Specia,et al.  Machine translation evaluation versus quality estimation , 2010, Machine Translation.

[14]  José B. Mariño,et al.  On the impact of morphology in English to Spanish statistical MT , 2008, Speech Commun..

[16]  SpeciaLucia,et al.  Machine translation evaluation versus quality estimation , 2010 .

[17]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[18]  José B. Mariño,et al.  The TALP-UPC phrase-based translation systems for WMT12: Morphology simplification and domain adaptation , 2012, WMT@NAACL-HLT.

[19]  Lluís Màrquez i Villodre,et al.  Asiya: An Open Toolkit for Automatic Machine Translation (Meta-)Evaluation , 2010, Prague Bull. Math. Linguistics.

[20]  Martin Porter,et al.  Snowball: A language for stemming algorithms , 2001 .

[21]  Richard M. Schwartz,et al.  Language and Translation Model Adaptation using Comparable Corpora , 2008, EMNLP.

[22]  Mark Hopkins,et al.  Tuning as Ranking , 2011, EMNLP.

[23]  Samuel Reese,et al.  FreeLing 2.1: Five Years of Open-source Language Processing Tools , 2010, LREC.