Constructing a Turkish-English Parallel TreeBank

In this paper, we report our preliminary efforts in building an English-Turkish parallel treebank corpus for statistical machine translation. In the corpus, we manually generated parallel trees for about 5,000 sentences from Penn Treebank. English sentences in our set have a maximum of 15 tokens, including punctuation. We constrained the translated trees to the reordering of the children and the replacement of the leaf nodes with appropriate glosses. We also report the tools that we built and used in our tree translation task.

[1]  Joakim Nivre,et al.  MaltParser: A Data-Driven Parser-Generator for Dependency Parsing , 2006, LREC.

[2]  Joakim Nivre,et al.  Swedish-Turkish Parallel Treebank , 2008, LREC.

[3]  Kemal Oflazer,et al.  Türkçe cümlelerin kural tabanlı bağlılık analizi , 2006 .

[4]  Jason Baldridge,et al.  Projective and Non-Projective Turkish Parsing , 2006 .

[5]  Ruken Cakici,et al.  Automatic Induction of a CCG Grammar for Turkish , 2005, ACL.

[6]  Kemal Oflazer,et al.  Statistical Dependency Parsing for Turkish , 2006, EACL.

[7]  Kemal Oflazer,et al.  Syntax-to-Morphology Mapping in Factored Phrase-Based Statistical Machine Translation from English to Turkish , 2010, ACL.

[8]  Lea Cyrus,et al.  FuSe – a Multi-Layered Parallel Treebank , 2022 .

[9]  Agnieszka Patejuk,et al.  ParGramBank: The ParGram Parallel Treebank , 2013, ACL.

[10]  Kemal Oflazer,et al.  Integrating derivational morphology into syntax , 2009 .

[11]  Deniz Yuret Dependency Parsing as a Classication Problem , 2006, CoNLL.

[12]  Deniz Yuret,et al.  Dependency Parsing as a Classification Problem , .

[13]  Ruken Cakici,et al.  Multi-lingual Dependency Parsing with Incremental Integer Linear Programming , 2006, CoNLL.

[14]  Joakim Nivre,et al.  The English-Swedish-Turkish Parallel Treebank , 2010, LREC.

[15]  Joakim Nivre,et al.  Talbanken05: A Swedish Treebank with Phrase Structure and Dependency Annotation , 2006, LREC.

[16]  Kemal Oflazer,et al.  The Annotation Process in the Turkish Treebank , 2003, LINC@EACL.

[17]  Kemal Oflazer,et al.  Morphology-Syntax Interface for Turkish LFG , 2006, ACL.

[18]  Kemal Oflazer,et al.  Dependency Parsing of Turkish , 2008, CL.

[19]  Lars Ahrenberg,et al.  LinES: An English-Swedish Parallel Treebank , 2007, NODALIDA.

[20]  Jan Hajic,et al.  Prague Czech-English Dependency Treebank. Syntactically Annotated Resources for Machine Translation , 2004, LREC.