Log-linear Models for Uyghur Segmentation in Spoken Language Translation

To alleviate data sparsity in spoken Uyghur machine translation, we proposed a log-linear based morphological segmentation approach. Instead of learning model only from monolingual annotated corpus, this approach optimizes Uyghur segmentation for spoken translation based on both bilingual and monolingual corpus. Our approach relies on several features such as traditional conditional random field (CRF) feature, bilingual word alignment feature and monolingual suffixword co-occurrence feature. Experimental results shown that our proposed segmentation model for Uyghur spoken translation achieved 1.6 BLEU score improvements compared with the state-of-the-art baseline.

[1]  John A. Goldsmith,et al.  Unsupervised Learning of the Morphology of a Natural Language , 2001, CL.

[2]  Murat Saraclar,et al.  Unsupervised Turkish Morphological Segmentation for Statistical Machine Translation , 2010 .

[3]  Oskar Kohonen,et al.  Semi-Supervised Learning of Concatenative Morphology , 2010, SIGMORPHON.

[4]  Sharon Goldwater,et al.  Minimally-Supervised Morphological Segmentation using Adaptor Grammars , 2013, TACL.

[5]  Hoifung Poon,et al.  Unsupervised Morphological Segmentation with Log-Linear Models , 2009, NAACL.

[6]  Yang Liu,et al.  Log-Linear Models for Word Alignment , 2005, ACL.

[7]  Tonghai Jiang,et al.  Optimized Uyghur Segmentation for Statistical Machine Translation , 2015, NLDB.

[8]  Ulrich Germann Proceedings of the First Conference on Machine Translation, Volume 2: Shared Task Papers , 2016 .

[9]  Mikko Kurimo,et al.  Painless Semi-Supervised Morphological Segmentation using Conditional Random Fields , 2014, EACL.

[10]  Cheng Yong Liu Yang Sun Maosong Abudukelimu Halidanmu Uyghur morphological segmentation with bidirectional GRU neural networks , 2017 .

[11]  Qun Liu,et al.  A Semisupervised Tag-Transition-Based Markovian Model for Uyghur Morphology Analysis , 2016, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[12]  Arianna Bisazza,et al.  Morphological pre-processing for Turkish to English statistical machine translation , 2009, IWSLT.

[13]  Young-Suk Lee,et al.  Morphological Analysis for Statistical Machine Translation , 2004, NAACL.

[14]  Ann Clifton,et al.  UNSUPERVISED MORPHOLOGICAL SEGMENTATION FOR STATISTICAL MACHINE TRANSLATION , 2010 .

[15]  Alon Lavie,et al.  The impact of Arabic morphological segmentation on broad-coverage English-to-Arabic statistical machine translation , 2012, AMTA.

[16]  KurimoMikko,et al.  Morph-based speech recognition and modeling of out-of-vocabulary words across languages , 2007 .

[17]  Siriwan Sereewattana Unsupervised Segmentation for Statistical Machine Translation , 2003 .

[18]  Mathias Creutz,et al.  Unsupervised Discovery of Morphemes , 2002, SIGMORPHON.

[19]  Nizar Habash,et al.  Orthographic and Morphological Processing for Persian-to-English Statistical Machine Translation , 2013, IJCNLP.