An Italian to Catalan RBMT system reusing data from existing language pairs

This paper presents an Italian!Catalan RBMT system automatically built by combining the linguistic data of the existing pairs Spanish‐Catalan and Spanish‐Italian. A lightweight manual postprocessing is carried out in order to fix inconsistencies in the automatically derived dictionaries and to add very frequent words that are missing according to a corpus analysis. The system is evaluated on the KDE4 corpus and outperforms Google Translate by approximately ten absolute points in terms of both TER and GTM.

[1]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[2]  Francis M. Tyers,et al.  Development of a free Basque to Spanish machine translation system , 2009 .

[3]  Francis M. Tyers,et al.  Developing Prototypes for Machine Translation between Two Sami Languages , 2009, EAMT.

[4]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[5]  Michael Grüninger,et al.  Introduction , 2002, CACM.

[6]  Philip Koehn,et al.  Statistical Machine Translation , 2010, EAMT.

[7]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[8]  Cyril Goutte Automatic Evaluation of Machine Translation Quality , 2006 .

[9]  Mikel L. Forcada,et al.  Inferring Shallow-Transfer Machine Translation Rules from Small Parallel Corpora , 2014, J. Artif. Intell. Res..

[10]  Mikel L. Forcada,et al.  Using target-language information to train part-of-speech taggers for machine translation , 2008, Machine Translation.

[11]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[12]  Philipp Koehn,et al.  Re-evaluating the Role of Bleu in Machine Translation Research , 2006, EACL.

[13]  Harold L. Somers,et al.  Computers and translation : a translator's guide , 2003 .

[14]  Joseph P. Turian,et al.  Evaluation of machine translation and its evaluation , 2003, MTSUMMIT.

[15]  Jörg Tiedemann,et al.  The OPUS corpus : parallel and free , 2004 .

[16]  Mikel L. Forcada,et al.  Reutilizacion de datos lingu´isticos para la creacion de un sistema de traduccion automatica para un nuevo par de lenguas Re-use of linguistic data to create a machine translation system for a new language pair , 2008 .

[17]  Penelope Sibun,et al.  A Practical Part-of-Speech Tagger , 1992, ANLP.

[18]  Francis Tyers,et al.  NUMBER ? ? ? JANUARY 2009 1 – 10 apertium-cya collaboratively-developed free RBMT system for Welsh to English , 2009 .

[19]  Trond Trosterud,et al.  Reuse of free resources in machine translation between Nynorsk and Bokmål , 2009, FREEOPMT.

[20]  M. Forcada Open-source machine translation : an opportunity for minor languages , 2006 .