Towards Improving English-Latvian Translation: A System Comparison and a New Rescoring Feature

Translation into the languages with relatively free word order has received a lot less attention than translation into fixed word order languages (English), or into analytical languages (Chinese). At the same time this translation task is found among the most difficult challenges for machine translation (MT), and intuitively it seems that there is some space in improvement intending to reflect the free word order structure of the target language. This paper presents a comparative study of two alternative approaches to statistical machine translation (SMT) and their application to a task of English-to-Latvian translation. Furthermore, a novel feature intending to reflect the relatively free word order scheme of the Latvian language is proposed and successfully applied on the n-best list rescoring step. Moving beyond classical automatic scores of translation quality that are classically presented in MT research papers, we contribute presenting a manual error analysis of MT systems output that helps to shed light on advantages and disadvantages of the SMT systems under consideration.

[1]  Hermann Ney,et al.  Error Analysis of Statistical Machine Translation Output , 2006, LREC.

[2]  José B. Mariño,et al.  Improving statistical MT by coupling reordering and decoding , 2006, Machine Translation.

[3]  Philipp Koehn,et al.  462 Machine Translation Systems for Europe , 2009, MTSUMMIT.

[4]  José A. R. Fonollosa,et al.  N-Gram-Based Statistical Machine Translation versus Syntax Augmented Machine Translation: Comparison and System Combination , 2009, EACL.

[5]  Francisco Casacuberta,et al.  Architectures for Speech-to-Speech Translation Using Finite-state Models , 2002, Speech-to-Speech Translation@ACL.

[6]  José B. Mariño,et al.  An n-gram-based statistical machine translation decoder , 2005, INTERSPEECH.

[7]  Sara Stymne,et al.  Blast: A Tool for Error Analysis of Machine Translation Output , 2011, ACL.

[8]  Beryl Hoffman,et al.  Translating into Free Word Order Languages , 1996, COLING.

[9]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[10]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[11]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[12]  Franz Josef Och,et al.  A Systematic Comparison of Phrase-Based, Hierarchical and Syntax-Augmented Statistical MT , 2008, COLING.

[13]  José A. R. Fonollosa,et al.  Ngram-based versus Phrase-based Statistical Machine Translation , 2005, IWSLT.

[14]  Mark Dras,et al.  Statistical Machine Translation of Australian Aboriginal Languages: Morphological Analysis with Languages of Differing Morphological Richness , 2007, ALTA.

[15]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[16]  Christoph Tillmann,et al.  A Unigram Orientation Model for Statistical Machine Translation , 2004, NAACL.

[17]  Franz Josef Och,et al.  An Efficient Method for Determining Bilingual Word Classes , 1999, EACL.

[18]  José B. Mariño,et al.  N-gram-based Machine Translation , 2006, CL.

[19]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[20]  José B. Mariño,et al.  N-gram-based versus phrase-based statistical machine translation , 2005, IWSLT.

[21]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.