Representation of Morphosyntactic Units and Coordination Structures in the Turkish Dependency Treebank

This paper presents our preliminary conclusions as part of an ongoing effort to construct a new dependency representation framework for Turkish. We aim for this new framework to accommodate the highly agglutinative morphology of Turkish as well as to allow the annotation of unedited web data, and shape our decisions around these considerations. In this paper, we firstly describe a novel syntactic representation for morphosyntactic sub-word units (namely inflectional groups (IGs) in Turkish) which allows inter-IG relations to be discerned with perfect accuracy without having to hide lexical information. Secondly, we investigate alternative annotation schemes for coordination structures and present a better scheme (nearly 11% increase in recall scores) than the one in Turkish Treebank (Oflazer et al., 2003) for both parsing accuracies and compatibility for colloquial language.

[1]  Christopher D. Manning,et al.  The Stanford Typed Dependencies Representation , 2008, CF+CDPE@COLING.

[2]  Joakim Nivre,et al.  Labeled Pseudo-Projective Dependency Parsing with Support Vector Machines , 2006, CoNLL.

[3]  Ozan Arkan Can,et al.  Multiword Expressions in Statistical Dependency Parsing , 2011, SPMRL@IWPT.

[4]  Joakim Nivre,et al.  An Efficient Algorithm for Projective Dependency Parsing , 2003, IWPT.

[5]  Gökhan Tür,et al.  Statistical Morphological Disambiguation for Agglutinative Languages , 2000, COLING.

[6]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[7]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[8]  Kemal Oflazer,et al.  Two-level Description of Turkish Morphology , 1993, EACL.

[9]  Reut Tsarfaty,et al.  Word-Based or Morpheme-Based? Annotation Strategies for Modern Hebrew Clitics , 2008, LREC.

[10]  Detmar Meurers,et al.  Revisiting the Impact of Different Annotation Schemes on PCFG Parsing: A Grammatical Dependency Evaluation , 2008 .

[11]  Dilek Z. Hakkani-Tür,et al.  Building a Turkish Treebank , 2003 .

[12]  Joakim Nivre,et al.  Talbanken05: A Swedish Treebank with Phrase Structure and Dependency Annotation , 2006, LREC.

[13]  Yuji Matsumoto MaltParser: A language-independent system for data-driven dependency parsing , 2005 .

[14]  Kemal Oflazer Two-level description of Turkish morphology , 1993 .

[15]  Kemal Oflazer,et al.  Dependency Parsing of Turkish , 2008, CL.

[16]  Joakim Nivre,et al.  Analyzing and Integrating Dependency Parsers , 2011, CL.

[17]  Daniel Zeman,et al.  Coordination Structures in Dependency Treebanks , 2013, ACL.

[18]  Joakim Nivre,et al.  Comparing the Influence of Different Treebank Annotations on Dependency Parsing , 2010, LREC.