Exploiting portability to build an RBMT prototype for a new source language

This paper presents the work done to port a deep-transfer rule-based machine translation system to translate from a different source language by maximizing the exploitation of existing resources and by limiting the development work. Specifically, we report the changes and effort required in each of the system’s modules to obtain an English-Basque translator, ENEUS, starting from the Spanish-Basque Matxin system. We run a human pairwise comparison for the new prototype and two statistical systems and see that ENEUS is preferred in over 30% of the test sentences.

[1]  Kemal Oflazer,et al.  Exploring Different Representational Units in English-to-Turkish Statistical Machine Translation , 2007, WMT@ACL.

[2]  Jean Senellart,et al.  Rapid development of new language pairs at SYSTRAN , 2007, MTSUMMIT.

[3]  Kepa Sarasola,et al.  Semiautomatic Labelling of Semantic Features , 2002, COLING.

[4]  Gorka Labaka,et al.  Developing an Open-Source FST Grammar for Verb Chain Transfer in a Spanish-Basque MT System , 2012, FSMNLP.

[5]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[6]  Gorka Labaka,et al.  An FST Grammar for Verb Chain Transfer in a Spanish-Basque MT System , 2005, FSMNLP.

[7]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[8]  Francis M. Tyers,et al.  Rapid rule-based machine translation between Dutch and Afrikaans , 2011, EAMT.

[9]  Francis M. Tyers,et al.  Apertium: a free/open-source platform for rule-based machine translation , 2011, Machine Translation.

[10]  Om P. Damani,et al.  Re-ordering Source Sentences for SMT , 2012, LREC.

[11]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[12]  Eneko Agirre,et al.  Methodology and construction of the Basque WordNet , 2011, Lang. Resour. Evaluation.

[13]  Xavier Carreras,et al.  FreeLing: An Open-Source Suite of Language Analyzers , 2004, LREC.

[14]  M. Gasser Toward a Rule-Based System for English-Amharic Translation , 2012 .

[15]  Gorka Labaka Intxauspe,et al.  EUSMT: incorporating linguistic information to SMT for a morphologically rich language. Its use in SMT-RBMT-EBMT hybridation , 2010 .

[16]  Alex Waibel,et al.  Dependency structures for statistical machine translation , 2012 .

[17]  Eneko Agirre,et al.  Use of Rich Linguistic Information to Translate Prepositions and Grammar Cases to Basque , 2009, EAMT.

[18]  Gorka Labaka,et al.  Matxin, an open-source rule-based machine translation system for Basque , 2011, Machine Translation.

[19]  Philipp Koehn,et al.  Re-evaluating the Role of Bleu in Machine Translation Research , 2006, EACL.