A tree-based approach for English-to-Turkish translation

In this paper, we present our English-to-Turkish translation methodology, which adopts a tree-based approach. Our approach relies on tree analysis and the application of structural modification rules to get the target side (Turkish) trees from source side (English) ones. We also use morphological analysis to get candidate root words and apply tree-based rules to obtain the agglutinated target words. Compared to earlier work on English-to-Turkish translation using phrase-based models, we have been able to obtain higher BLEU scores in our current study. Our syntactic subtree permutation strategy, combined with a word replacement algorithm, provides a 67 % relative improvement from a baseline 12.8 to 21.4 BLEU, all averaged over 10-fold cross-validation. As future work, improvements in choosing the correct senses and structural rules are needed.

[1]  Kemal Oflazer,et al.  Exploring Different Representational Units in English-to-Turkish Statistical Machine Translation , 2007, WMT@ACL.

[2]  Kemal Oflazer Statistical Machine Translation into a Morphologically Complex Language , 2008, CICLing.

[3]  Hermann Ney,et al.  Improved Alignment Models for Statistical Machine Translation , 1999, EMNLP.

[4]  Olcay Taner Yildiz,et al.  A Novel Approach to Morphological Disambiguation for Turkish , 2011, ISCIS.

[5]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[6]  Eser Emine Erguvanlı The Function of Word Order in Turkish Grammar , 1984 .

[7]  John D. Lafferty,et al.  The Candide System for Machine Translation , 1994, HLT.

[8]  Olcay Taner Yildiz,et al.  English-Turkish Parallel Treebank with Morphological Annotations and its Use in Tree-based SMT , 2016, ICPRAM.

[9]  Hermann Ney,et al.  HMM-Based Word Alignment in Statistical Translation , 1996, COLING.

[10]  Nitin Madnani,et al.  The Hiero Machine Translation System: Extensions, Evaluation, and Analysis , 2005, HLT.

[11]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[12]  Philip Koehn,et al.  Statistical Machine Translation , 2010, EAMT.

[13]  David G. Hays,et al.  11 ALPAC : The ( In ) Famous Report , 2015 .

[14]  Kemal Oflazer,et al.  Initial Explorations in English to Turkish Statistical Machine Translation , 2006, WMT@HLT-NAACL.

[15]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[16]  Kemal Oflazer,et al.  Syntax-to-Morphology Mapping in Factored Phrase-Based Statistical Machine Translation from English to Turkish , 2010, ACL.

[17]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[18]  Kemal Oflazer Turkish and its challenges for language processing , 2014, Lang. Resour. Evaluation.

[19]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[20]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[21]  Kemal Oflazer,et al.  Exploiting Morphology and Local Word Reordering in English-to-Turkish Phrase-Based Statistical Machine Translation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.