SMT and Hybrid systems of the QTLeap project in the WMT16 IT-task

This paper presents the description of 12 systems submitted to the WMT16 IT-task, covering six different languages, namely Basque, Bulgarian, Dutch, Czech, Portuguese and Spanish. All these systems were developed under the scope of the QTLeap project, presenting a common strategy. For each language two different systems were submitted, namely a phrasebased MT system built using Moses, and a system exploiting deep language engineering approaches, that in all the languages but Bulgarian was implemented using TectoMT. For 4 of the 6 languages, the TectoMT-based system performs better than the Moses-based one.

[1]  António Branco,et al.  Bootstrapping a hybrid deep MT system , 2015, HyTra@ACL.

[2]  Zdenek Zabokrtský,et al.  Hidden Markov Tree Model in Dependency-based Machine Translation , 2009, ACL/IJCNLP.

[3]  Rudolf Rosa,et al.  Chimera - Three Heads for English-to-Czech Translation , 2013, WMT@ACL.

[4]  Rudolf Rosa,et al.  Dictionary-based Domain Adaptation of MT Systems without Retraining , 2016, WMT.

[5]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[6]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[7]  Zdenek Zabokrtský,et al.  Maximum Entropy Translation Model in Dependency-Based MT Framework , 2010, WMT@ACL.

[8]  António Branco,et al.  A Suite of Shallow Processing Tools for Portuguese: LX-Suite , 2006, EACL.

[9]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[10]  Gertjan van Noord,et al.  At Last Parsing Is Now Operational , 2006, JEPTALNRECITAL.

[11]  Zdenek Zabokrtský,et al.  Feature Engineering in Maximum Spanning Tree Dependency Parser , 2007, International Conference on Text, Speech and Dialogue.

[12]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[13]  Eneko Agirre,et al.  Word Sense-Aware Machine Translation: Including Senses as Contextual Features for Improved Translation Models , 2016, LREC.

[14]  Sanja Stajner,et al.  Use of Domain-Specific Language Resources in Machine Translation , 2016, LREC.

[15]  Pierre Nugues,et al.  A High-Performance Syntactic and Semantic Dependency Parser , 2010, COLING.

[16]  Sanja Stajner,et al.  Domain-Specific Hybrid Machine Translation from English to Portuguese , 2016, PROPOR.

[17]  R. Urizar Robustness and customisation in an analyser / lemmatiser for Basque , 2002 .

[18]  Ondrej Dusek,et al.  The Joy of Parallelism with CzEng 1.0 , 2012, LREC.

[19]  Joakim Nivre,et al.  Universal Stanford dependencies: A cross-linguistic typology , 2014, LREC.

[20]  G.J.M. van Noord,et al.  A Sentence Generator for Dutch , 2010 .

[21]  Jan Hajic,et al.  Open-Source Tools for Morphology, Lemmatization, POS Tagging and Named Entity Recognition , 2014, ACL.

[22]  Díaz de Ilarraza Construction of a Basque Dependency Treebank , 2003 .

[23]  Petr Pajas,et al.  TectoMT: Highly Modular MT System with Tectogrammatics Used as Transfer Layer , 2008, WMT@ACL.

[24]  Daniel Zeman,et al.  Reusable Tagset Conversion Using Tagset Drivers , 2008, LREC.

[25]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[26]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[27]  Fernando Pereira,et al.  Non-Projective Dependency Parsing using Spanning Tree Algorithms , 2005, HLT.

[28]  Sanja Stajner,et al.  Bootstrapping a Hybrid MT System to a New Language Pair , 2016, LREC.

[29]  Rudolf Rosa,et al.  Translation Model Interpolation for Domain Adaptation in TectoMT , 2015, DMTW.

[30]  Petya Osenova,et al.  Factored models for Deep Machine Translation , 2015, DMTW.