MaTrEx: the DCU machine translation system for IWSLT 2007

In this paper, we give a description of the machine translation system developed at DCU that was used for our second participation in the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT 2007). In this participation, we focus on some new methods to improve system quality. Specifically, we try our word packing technique for different language pairs, we smooth our translation tables with out-of-domain word translations for the Arabic–English and Chinese–English tasks in order to solve the high number of out of vocabulary items, and finally we deploy a translation-based model for case and punctuation restoration. We participated in both the classical and challenge tasks for the following translation directions: Chinese–English, Japanese–English and Arabic–English. For the last two tasks, we translated both the single-best ASR hypotheses and the correct recognition results; for Chinese– English, we just translated the correct recognition results. We report the results of the system for the provided evaluation sets, together with some additional experiments carried out following identification of some simple tokenisation errors in the official runs.

[1]  Sabine Buchholz,et al.  Introduction to the CoNLL-2000 Shared Task Chunking , 2000, CoNLL/LLL.

[2]  Andy Way,et al.  Robust large-scale EBMT with marker-based segmentation , 2004, TMI.

[3]  Hermann Ney,et al.  Improvements in Phrase-Based Statistical Machine Translation , 2004, NAACL.

[4]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[5]  Young-Suk Lee,et al.  IBM Arabic-to-English translation for IWSLT 2006 , 2006, IWSLT.

[6]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[7]  Ralph Johnson,et al.  design patterns elements of reusable object oriented software , 2019 .

[8]  Andy Way,et al.  Hybrid Example-Based SMT: the Best of Both Worlds? , 2005, ParallelText@ACL.

[9]  Hermann Ney,et al.  CDER: Efficient MT Evaluation Using Block Movements , 2006, EACL.

[10]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[11]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[12]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[13]  Makoto Nagao,et al.  A framework of a mechanical translation between Japanese and English by analogy principle , 1984 .

[14]  Philipp Koehn,et al.  Pharaoh: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models , 2004, AMTA.

[15]  Stanley F. Chen,et al.  An empirical study of smoothing techniques for language modeling , 1999 .

[16]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[17]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[18]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[19]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[20]  Hermann Ney,et al.  HMM-Based Word Alignment in Statistical Translation , 1996, COLING.

[21]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[22]  Thomas R. G. Green,et al.  The necessity of syntax markers: Two experiments with artificial languages , 1979 .

[23]  I. Dan Melamed,et al.  Models of translation equivalence among words , 2000, CL.

[24]  Andy Way,et al.  MATREX: DCU machine translation system for IWSLT 2006. , 2006, IWSLT.

[25]  Marco Baroni,et al.  Morph-it! A free corpus-based morphological resource for the Italian language , 2005 .

[26]  Eiichiro Sumita,et al.  Toward a Broad-coverage Bilingual Corpus for Speech Translation of Travel Conversations in the Real World , 2002, LREC.

[27]  Yuji Matsumoto,et al.  Automatic Extraction of Word Sequence Correspondences in Parallel Corpora , 1996, VLC@COLING.

[28]  Jörg Tiedemann,et al.  Combining Clues for Word Alignment , 2003, EACL.

[29]  Ann Bies,et al.  Developing an Arabic Treebank: Methods, Guidelines, Procedures, and Tools , 2004 .

[30]  Hermann Ney,et al.  The RWTH Phrase-based Statistical Machine Translation System , 2005, IWSLT.

[31]  Yanjun Ma,et al.  Bootstrapping Word Alignment via Word Packing , 2007, ACL.

[32]  Daniel Jurafsky,et al.  Automatic Tagging of Arabic Text: From Raw Text to Base Phrase Chunks , 2004, NAACL.

[33]  Andy Way,et al.  Example-Based Machine Translation of the Basque Language , 2006 .

[34]  Yanjun Ma,et al.  Alignment-guided chunking , 2007 .

[35]  Noriko Kando,et al.  Overview of the IWSLT04 evaluation campaign , 2004, IWSLT.

[36]  Andy Way,et al.  MaTrEx: machine translation using examples , 2006 .