Technical report: OpenMaTrEx, a free, open-source hybrid data-driven machine translation system

This report describes OpenMaTrEx, a free/open-source hybrid data-driven machine translation system containing core example-based components based on the marker hypothesis. OpenMaTrEx comprises a marker-driven chunker, a collection of chunk aligners, tools to merge (\hybridise") marker-based and statistical translation tables, two engines |a simple proof-of-concept monotone \example-based" recombination engine and a statistical decoder based on Moses |, and support for automatic evaluation. It also contains support for \word packing" to improve alignment. OpenMaTrEx is a free/open-source release of basic components of MaTrEx, the Dublin City University machine translation system. The components and processes implemented in OpenMaTrEx are described in both theoretical and functional detail. Additionally, experimental results are shown in which OpenMaTrEx is compared to plain statistical machine translation on representative tasks.

[1]  Andy Way,et al.  Robust large-scale EBMT with marker-based segmentation , 2004, TMI.

[2]  Andy Way,et al.  Example-Based Machine Translation of the Basque Language , 2006 .

[3]  Mauro Cettolo,et al.  Efficient Handling of N-gram Language Models for Statistical Machine Translation , 2007, WMT@ACL.

[4]  Yanjun Ma,et al.  Exploiting alignment techniques in MATREX: the DCU machine translation system for IWSLT 2008 , 2008, IWSLT.

[5]  Thomas R. G. Green,et al.  The necessity of syntax markers: Two experiments with artificial languages , 1979 .

[6]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[7]  Yanjun Ma,et al.  MaTrEx: the DCU machine translation system for IWSLT 2007 , 2007, IWSLT.

[8]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[9]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[10]  Andy Way,et al.  OpenMaTrEx: A Free/Open-Source Marker-Driven Example-Based Machine Translation System , 2010, IceTAL.

[11]  Andy Way,et al.  Wrapper Syntax for Example-Based Machine Translation , 2006 .

[12]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[13]  P. H. Matthews,et al.  牛津语言学词典 = Oxford concise dictionary of linguistics , 2000 .

[14]  Declan Groves,et al.  Evaluating syntax-driven approaches to phrase extraction for MT , 2009 .

[15]  Andy Way,et al.  MaTrEx: the DCU MT System for NTCIR-8 , 2010, NTCIR.

[16]  L. R. Dice Measures of the Amount of Ecologic Association Between Species , 1945 .

[17]  Philipp Koehn,et al.  Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation , 2010, WMT@ACL.

[18]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[19]  ANDY WAY,et al.  Comparing example-based and statistical machine translation , 2005, Nat. Lang. Eng..

[20]  Andy Way,et al.  Marker-Based Filtering of Bilingual Phrase Pairs for SMT , 2009, EAMT.

[21]  Francis M. Tyers Rule-based Breton to French machine translation , 2010, EAMT.

[22]  Yanjun Ma,et al.  MaTrEx: The DCU MT System for WMT 2008 , 2008, WMT@ACL.

[23]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[24]  Hermann Ney,et al.  CDER: Efficient MT Evaluation Using Block Movements , 2006, EACL.

[25]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[26]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments , 2007, WMT@ACL.

[27]  Francis M. Tyers Rule-Based Augmentation of Training Data in Breton-French Statistical Machine Translation , 2009, EAMT.

[28]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[29]  Andy Way,et al.  Hybrid Example-Based SMT: the Best of Both Worlds? , 2005, ParallelText@ACL.

[30]  Andy Way,et al.  A memory-based classification approach to marker-based EBMT , 2007 .

[31]  Andy Way,et al.  MaTrEx: The DCU MT System for WMT 2008 , 2008, WMT@ACL.

[32]  Yanjun Ma,et al.  Low-resource machine translation using MATREX: the DCU machine translation system for IWSLT 2009 , 2009, IWSLT.

[33]  Andy Way,et al.  MATREX: DCU machine translation system for IWSLT 2006. , 2006, IWSLT.

[34]  Andy Way,et al.  Hybrid rule-based - example-based MT: feeding Apertium with sub-sentential translation units , 2009 .

[35]  Aaron B. Phillips,et al.  Cunei Machine Translation Platform : System Description , 2009 .

[36]  Yanjun Ma,et al.  Bootstrapping Word Alignment via Word Packing , 2007, ACL.

[37]  Daniel S. Hirschberg,et al.  A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[38]  Andy Way,et al.  English-Hindi Transliteration Using Context-Informed PB-SMT: the DCU System for NEWS 2009 , 2009, NEWS@IJCNLP.

[39]  Andy Way,et al.  MaTrEx: The DCU Machine Translation System for ICON 2008 , 2008 .

[40]  Andy Way,et al.  Hybridity in MT. Experiments on the Europarl Corpus , 2006, EAMT.

[41]  Francis M. Tyers,et al.  The Apertium machine translation platform: five years on , 2009 .