A Statistical Method for Translating Chinese into Under-resourced Minority Languages

In order to improve the performance of statistical machine translation between Chinese and minority languages, most of which are under-resourced languages with different word order and rich morphology, the paper proposes a method which incorporates syntactic information of the source-side and morphological information of the target-side to simultaneously reduce the differences of word order and morphology. First, according to the word alignment and the phrase structure trees of source language, reordering rules are extracted automatically to adjust the word order at source side. And then based on Hidden Markov Model, a morphological segmentation method is adopted to obtain morphological information of the target language. In the experiments, we take the Chinese-Mongolian translation as an example. A morpheme-level statistical machine translation system, constructed based on the reordered source side and the segmented target side, achieves 2.1 BLEU points increment over the standard phrase-based system.

[1]  Chen Le Reordering for Chinese-Mongolian SMT Based on Small Parallel Corpus , 2013 .

[2]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[3]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[4]  Xiaoqiang Luo,et al.  Constituent Reordering and Syntax Models for English-to-Japanese Statistical Machine Translation , 2010, COLING.

[5]  Preslav Nakov,et al.  A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages , 2010, EMNLP.

[6]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[7]  Nenghai Yu,et al.  A Ranking-based Approach to Word Reordering for Statistical Machine Translation , 2012, ACL.

[8]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[9]  Hermann Ney,et al.  Advancements in Reordering Models for Statistical Machine Translation , 2013, ACL.

[10]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[11]  Khalil Sima'an,et al.  Context-Sensitive Syntactic Source-Reordering by Statistical Transduction , 2011, IJCNLP.

[12]  Philip Resnik,et al.  A Unified Model for Soft Linguistic Reordering Constraints in Statistical Machine Translation , 2014, ACL.

[13]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[14]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[15]  Li Wen,et al.  Chained Machine Translation Using Morphemes as Pivot Language , 2010 .

[16]  Hoifung Poon,et al.  Unsupervised Morphological Segmentation with Log-Linear Models , 2009, NAACL.

[17]  Karthik Visweswariah,et al.  Syntax Based Reordering with Automatically Derived Rules for Improved Statistical Machine Translation , 2010, COLING.

[18]  Jiajun Zhang,et al.  A Framework for Effectively Integrating Hard and Soft Syntactic Rules into Phrase Based Translation , 2009, PACLIC.

[19]  Lei Chen,et al.  A Rule-Based Source-Side Reordering on Phrase Structure Subtrees , 2011, 2011 International Conference on Asian Language Processing.

[20]  Hou Hong Mongolian Word Segmentation Based on Statistical Language Model , 2009 .

[21]  Maosong Sun,et al.  A Neural Reordering Model for Phrase-based Translation , 2014, COLING.

[22]  Tiejun Zhao,et al.  A Lexicalized Reordering Model for Hierarchical Phrase-based Translation , 2014, COLING.