A Purely Monotonic Approach to Machine Translation for Similar Languages

This paper investigates the effect of taking a strictly monotonic approach to machine translation for a restricted set of suitable language pairs. We studied the effect of decoding monotonically for a set of language pairs which has similar word order characteristics and found that for some language pairs - namely language pairs where both languages are in SOV order - there was almost no difference in machine translation quality. The results of this experiment motivated the extension of the monotonic approach into the alignment stage of the training. We used a Bayesian non-parametric aligner that has been shown to out-perform GIZA++ in combination with the grow-diag-final- and heuristic on transliteration data. Our results show that the monotonic aligner was able to match the performance of the GIZA++ baseline, and gains in translation performance were obtained by integrating both aligners into the systems.

[1]  Eiichiro Sumita,et al.  A Bayesian model of bilingual segmentation for transliteration , 2010, IWSLT.

[2]  Fei Xia,et al.  Improving a Statistical MT System with Automatically Learned Rewrite Patterns , 2004, COLING.

[3]  Mi-Young Kim,et al.  Transliteration Generation and Mining with Limited Training Resources , 2010, NEWS@ACL.

[4]  Wolfgang Macherey,et al.  Lattice-based Minimum Error Rate Training for Statistical Machine Translation , 2008, EMNLP.

[5]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[6]  Eiichiro Sumita,et al.  Phrase-based Machine Transliteration , 2008, IJCNLP.

[7]  Karthik Gali,et al.  Modeling Machine Transliteration as a Phrase Based Statistical Machine Translation Problem , 2009, NEWS@IJCNLP.

[8]  Masao Utiyama,et al.  Post-ordering by Parsing for Japanese-English Statistical Machine Translation , 2012, ACL.

[9]  Kevin Duh,et al.  Post-ordering in Statistical Machine Translation , 2011, MTSUMMIT.

[10]  Eiichiro Sumita,et al.  Creating corpora for speech-to-speech translation , 2003, INTERSPEECH.

[11]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[12]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[13]  Grzegorz Kondrak,et al.  Joint Processing and Discriminative Training for Letter-to-Phoneme Conversion , 2008, ACL.

[14]  Sara Noeman Language Independent Transliteration System Using Phrase-based SMT Approach on Substrings , 2009, NEWS@IJCNLP.