Chinese-uyghur statistical machine translation: The initial explorations

In this paper, we present results of initial explorations to a phrase-based statistical machine translation system for a new language pair, namely Chinese-Uyghur. They are very different from each other, the characters of the former almost are hieroglyphics, morpheme processing don't work at all, but the latter is an agglutinative language with very productive inflectional and derivational word-formation processes. To make them more similar, we reorder Chinese sentence structures from SVO to SOV and split Uyghur words into morphemes. The experiments show reordering Chinese sentence structure and properly splitting granularity for Uyghur can effectively improve the performances of translation system.

[1]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[2]  Nizar Habash,et al.  Permission is granted to quote short excerpts and to reproduce figures and tables from this report, provided that the source of such material is fully acknowledged. Arabic Preprocessing Schemes for Statistical Machine Translation , 2006 .

[3]  Yuji Matsumoto,et al.  Phrase reordering for statistical machine translation based on predicate-argument structure , 2006, IWSLT.

[4]  Kemal Oflazer,et al.  Initial Explorations in English to Turkish Statistical Machine Translation , 2006, WMT@HLT-NAACL.

[5]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[6]  Fei Xia,et al.  Improving a Statistical MT System with Automatically Learned Rewrite Patterns , 2004, COLING.

[7]  Stephan Vogel,et al.  Bridging the Inflection Morphology Gap for Arabic Statistical Machine Translation , 2006, NAACL.

[8]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[9]  Philipp Koehn,et al.  Clause Restructuring for Statistical Machine Translation , 2005, ACL.

[10]  Mathias Creutz,et al.  Unsupervised Morpheme Segmentation and Morphology Induction from Text Corpora Using Morfessor 1.0 , 2005 .

[11]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[12]  NeyHermann,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004 .

[13]  Arianna Bisazza,et al.  Morphological pre-processing for Turkish to English statistical machine translation , 2009, IWSLT.

[14]  Mathias Creutz,et al.  Morphology-aware statistical machine translation based on morphs induced in an unsupervised manner , 2007, MTSUMMIT.

[15]  Kemal Oflazer,et al.  Exploring Different Representational Units in English-to-Turkish Statistical Machine Translation , 2007, WMT@ACL.