IMST: A Revisited Turkish Dependency Treebank

In this paper, we present a critical analysis of the dependency annotation framework used in the METU-Sabancı Treebank (MST), and propose new annotation schemes that would alleviate the issues we have identified. Later, we describe our attempt at reannotating the treebank from the ground up using the proposed schemes, and then compare the consistencies of the two versions via cross-validation using a dependency parser. According to our experiments, the reannotated version of the original treebank, which we call the ITU-METU-Sabancı Treebank (IMST), demonstrates a labeled attachment score of 75.3% and an unlabeled attachment score of 83.7%, surpassing the corresponding scores of 65.9% and 76.0% for MST by a very large margin.

[1]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[2]  Kemal Oflazer,et al.  Dependency Parsing of Turkish , 2008, CL.

[3]  Gülsen Eryigit,et al.  ITU Turkish NLP Web Service , 2014, EACL.

[4]  Brendan T. O'Connor,et al.  A Framework for (Under)specifying Dependency Syntax without Overloading Annotators , 2013, LAW@ACL.

[5]  János Csirik,et al.  The Szeged Treebank , 2005, TSD.

[6]  Gülsen Eryigit,et al.  Representation of Morphosyntactic Units and Coordination Structures in the Turkish Dependency Treebank , 2013, SPMRL@EMNLP.

[7]  Gülsen Eryigit ITU Treebank Annotation Tool , 2007, LAW@ACL.

[8]  Adam Kilgarriff,et al.  Word Sketches for Turkish , 2012, LREC.

[9]  János Csirik,et al.  Szeged Corpus 2.5: Morphological Modifications in a Manually POS-tagged Hungarian Corpus , 2014, LREC.

[10]  Ruket Cakici,et al.  Wide-coverage parsing for Turkish , 2009 .

[11]  Dilek Z. Hakkani-Tür,et al.  Building a Turkish Treebank , 2003 .

[12]  W. Keith Percival,et al.  Reflections on the history of dependency notions in linguistics , 1990 .

[13]  Kemal Oflazer,et al.  Statistical Dependency Parsing for Turkish , 2006, EACL.

[14]  Durgar El-Kahlout Ahmet Afs Initial Explorations in Two-phase Turkish Dependency Parsing by Incorporating Constituents , 2014 .

[15]  Jan Hajic,et al.  The Prague Dependency Treebank , 2003 .

[16]  Özlem Çetinoglu Turkish Treebank as a Gold Standard for Morphological Disambiguation and Its Influence on Parsing , 2014, LREC.

[17]  Joakim Nivre,et al.  Dependency Parsing , 2009, Lang. Linguistics Compass.

[18]  Joakim Nivre,et al.  Talbanken05: A Swedish Treebank with Phrase Structure and Dependency Annotation , 2006, LREC.

[19]  Gülşen Eryiğit,et al.  ITU Validation Set , 2014 .

[20]  Eduard Bejcek,et al.  Prague Dependency Treebank 2.5 – a Revisited Version of PDT 2.0 , 2012, COLING.

[21]  Gülşen Eryiğit,et al.  ITU Validation Set for Metu-Sabancı Turkish Treebank , 2014 .

[22]  Gökhan Tür,et al.  Statistical Morphological Disambiguation for Agglutinative Languages , 2000, COLING.

[23]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[24]  Jonas Kuhn,et al.  Towards Joint Morphological Analysis and Dependency Parsing of Turkish , 2013, DepLing.

[25]  Gertjan van Noord,et al.  The Alpino Dependency Treebank , 2001, CLIN.

[26]  Gülşen Eryiğit,et al.  Redefinition of Turkish Morphology Using Flag Diacritics , 2013 .

[27]  Ozan Arkan Can,et al.  Multiword Expressions in Statistical Dependency Parsing , 2011, SPMRL@IWPT.

[28]  Samuel R. Bowman,et al.  More Constructions, More Genres: Extending Stanford Dependencies , 2013, DepLing.

[29]  Tapio Salakoski,et al.  Building the essential resources for Finnish: the Turku Dependency Treebank , 2013, Language Resources and Evaluation.

[30]  Gülsen Eryigit,et al.  The Annotation Process of the ITU Web Treebank , 2015, LAW@NAACL-HLT.

[31]  Joakim Nivre,et al.  MaltParser: A Language-Independent System for Data-Driven Dependency Parsing , 2007, Natural Language Engineering.

[32]  Christopher D. Manning,et al.  The Stanford Typed Dependencies Representation , 2008, CF+CDPE@COLING.

[33]  Joakim Nivre,et al.  Universal Dependency Annotation for Multilingual Parsing , 2013, ACL.

[34]  Gülsen Eryigit The Impact of Automatic Morphological Analysis & Disambiguation on Dependency Parsing of Turkish , 2012, LREC.

[35]  Daniel Zeman,et al.  Coordination Structures in Dependency Treebanks , 2013, ACL.