IRIS: English-Irish Machine Translation System

We describe IRIS, a statistical machine translation (SMT) system for translating from English into Irish and vice versa. Since Irish is considered an under-resourced language with a limited amount of machine-readable text, building a machine translation system that produces reasonable translations is rather challenging. As translation is a difficult task, current research in SMT focuses on obtaining statistics either from a large amount of parallel, monolingual or other multilingual resources. Nevertheless, we collected available English-Irish data and developed an SMT system aimed at supporting human translators and enabling cross-lingual language technology tasks.

[1]  Kevin P. Scannell Machine translation for closely related language pairs , 2022 .

[2]  Michael Piotrowski,et al.  Natural Language Processing for Historical Texts , 2012, Synthesis Lectures on Human Language Technologies.

[3]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[4]  Jörg Tiedemann,et al.  Billions of Parallel Words for Free: Building and Using the EU Bookshop Corpus , 2014, LREC.

[5]  William Lewis,et al.  Crisis MT: Developing A Cookbook for MT in Crisis Situations , 2011, WMT@EMNLP.

[6]  Esslli Site,et al.  Natural Language Processing for Historical Texts , 2012 .

[7]  Maja Popovic,et al.  chrF: character n-gram F-score for automatic MT evaluation , 2015, WMT@EMNLP.

[8]  Deborah A. Coughlin,et al.  Correlating automated and human assessments of machine translation quality , 2003, MTSUMMIT.

[9]  Shelley Tulloch Preserving Dialects of an Endangered Language , 2006 .

[10]  Philipp Koehn,et al.  Synthesis Lectures on Human Language Technologies , 2016 .

[11]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[12]  Hans Uszkoreit,et al.  The Irish Language in the Digital Age , 2012 .

[13]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[14]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[15]  William Lewis,et al.  Haitian Creole: How to Build and Ship an MT Engine from Scratch in 4 days, 17 hours, & 30 minutes , 2010, EAMT.

[16]  Philipp Koehn,et al.  Results of the WMT15 Metrics Shared Task , 2015, WMT@EMNLP.

[17]  Alon Lavie,et al.  Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.

[18]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[19]  Serge Sharoff,et al.  Translating from under-resourced languages: comparing direct transfer against pivot translation , 2007, MTSUMMIT.

[20]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[21]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[22]  John Judge,et al.  Tapadoir: developing a statistical machine translation engine and associated resources for Irish , 2015 .

[23]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[24]  Kevin P. Scannell Statistical models for text normalization and machine translation , 2014 .

[25]  W. Lewis,et al.  Building MT for a Severely Under-Resourced Language: White Hmong , 2012, AMTA.

[26]  JANE H. Hill Reversing Language Shift: Theoretical and Empirical Foundations of Assistance to Threatened Languages , 1994 .