Apertium: a free/open-source platform for rule-based machine translation

Apertium is a free/open-source platform for rule-based machine translation. It is being widely used to build machine translation systems for a variety of language pairs, especially in those cases (mainly with related-language pairs) where shallow transfer suffices to produce good quality translations, although it has also proven useful in assimilation scenarios with more distant pairs involved. This article summarises the Apertium platform: the translation engine, the encoding of linguistic data, and the tools developed around the platform. The present limitations of the platform and the challenges posed for the coming years are also discussed. Finally, evaluation results for some of the most active language pairs are presented. An appendix describes Apertium as a free/open-source project.

[1]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[2]  Mikel L. Forcada,et al.  Shallow parsing for Portuguese-Spanish machine translation , 2003 .

[3]  Philip Koehn,et al.  Statistical Machine Translation , 2010, EAMT.

[4]  Atro Voutilainen,et al.  A language-independent system for parsing unrestricted text , 1995 .

[5]  Septina Dian Larasati,et al.  A Study of Indonesian-to-Malaysian MT System , 2010 .

[6]  Mikel L. Forcada,et al.  Using target-language information to train part-of-speech taggers for machine translation , 2008, Machine Translation.

[7]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[8]  Amba Kulkarni,et al.  Anusaaraka: An approach to Machine Translation , 2010 .

[9]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[10]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[11]  Gorka Labaka,et al.  Transfer-Based MT from Spanish into Basque: Reusability, Standardization and Open Source , 2007, CICLing.

[12]  G. Thurmair Comparing different architectures of hybrid Machine Translation systems , 2009, MTSUMMIT.

[13]  Chris Fox,et al.  The Handbook of Computational Linguistics and Natural Language Processing , 2010 .

[14]  Francis M. Tyers,et al.  Development of a free Basque to Spanish machine translation system , 2009 .

[15]  Andy Way,et al.  Hybrid rule-based - example-based MT: feeding Apertium with sub-sentential translation units , 2009 .

[16]  Rafael C. Carrasco,et al.  Incremental construction and maintenance of morphological analysers based on augmented letter transducers , 2002, TMI.

[17]  Emmanuel Roche,et al.  Finite-State Language Processing , 1997 .

[18]  Francis M. Tyers,et al.  Matxin: Moving towards language independence , 2009, FREEOPMT.

[19]  F. Sánchez-Martínez Using unsupervised corpus-based methods to build rule-based machine translation systems , 2011 .

[20]  Sergio Ortiz Rojas,et al.  The Spanish<>Catalan machine translation system interNOSTRUM , 2001, MTSUMMIT.

[21]  Anabela Barreiro,et al.  OpenLogos MT and the SAL representation language , 2009 .

[22]  Harold L. Somers,et al.  An introduction to machine translation , 1992 .

[23]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[24]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[25]  Susan A. Murphy,et al.  Monographs on statistics and applied probability , 1990 .

[26]  Mikel L. Forcada,et al.  Construcción y minimización eficiente de transductores de letras a partir de diccionarios con paradigmas , 2005, Proces. del Leng. Natural.

[27]  Penelope Sibun,et al.  A Practical Part-of-Speech Tagger , 1992, ANLP.

[28]  Víctor M. Sánchez-Cartagena,et al.  ScaleMT: a Free/Open-Source Framework for Building Scalable Machine Translation Web Services , 2010, Prague Bull. Math. Linguistics.

[29]  Francis M. Tyers,et al.  apertium-cy - a collaboratively-developed free RBMT system for Welsh to English , 2009, Prague Bull. Math. Linguistics.

[30]  Aaron B. Phillips Sub-phrasal matching and structural templates in example-based MT , 2007, TMI.

[31]  Mikel L. Forcada,et al.  Inferring Shallow-Transfer Machine Translation Rules from Small Parallel Corpora , 2014, J. Artif. Intell. Res..

[32]  Mikel L. Forcada,et al.  Reutilización de datos lingísticos para la creación de un sistema de traducción automática para un nuevo par de lenguas , 2008, Proces. del Leng. Natural.

[33]  Francis M. Tyers,et al.  Developing Prototypes for Machine Translation between Two Sami Languages , 2009, EAMT.

[34]  Francis M. Tyers,et al.  Desarrollo de un sistema libre de traducción automática del euskera al castellano , 2009, Proces. del Leng. Natural.

[35]  Xavier Carreras,et al.  FreeLing: An Open-Source Suite of Language Analyzers , 2004, LREC.

[36]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[37]  Víctor M. Sánchez-Cartagena,et al.  Tradubi: Open-Source Social Translation for the Apertium Machine Translation Platform , 2010, Prague Bull. Math. Linguistics.