Hungarian Dependency Treebank

Herein, we present the process of developing the first Hungarian Dependency TreeBank. First, short references are made to dependency grammars we considered important in the development of our Treebank. Second, mention is made of existing dependency corpora for other languages. Third, we present the steps of converting the Szeged Treebank into dependency-tree format: from the originally phrase-structured treebank, we produced dependency trees by automatic conversion, checked and corrected them thereby creating the first manually annotated dependency corpus for Hungarian. We also go into detail about the two major sets of problems, i.e. coordination and predicative nouns and adjectives. Fourth, we give statistics on the treebank: by now, we have completed the annotation of business news, newspaper articles, legal texts and texts in informatics, at the same time, we are planning to convert the entire corpus into dependency tree format. Finally, we give some hints on the applicability of the system: the present database may be utilized ― among others ― in information extraction and machine translation as well.

[1]  Martin Cmejrek,et al.  Prague Czech-English Dependency Treebank: Any Hopes for a Common Annotation Scheme? , 2004, FCP@NAACL-HLT.

[2]  Jan Hajic,et al.  Prague Arabic Dependency Treebank: Development in Data and Tools , 2004 .

[3]  Saso Dzeroski,et al.  Towards a Slovene Dependency Treebank , 2006, LREC.

[4]  János Csirik,et al.  The Szeged Treebank , 2005, TSD.

[5]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[6]  Jan Hajic,et al.  Prague Czech-English Dependency Treebank. Syntactically Annotated Resources for Machine Translation , 2004, LREC.

[7]  David Bamman,et al.  The Design and Use of a Latin Dependency Treebank , 2006 .

[8]  Gábor Prószéky,et al.  A dependency syntax of Hungarian , 1989 .

[9]  Klaus Schubert,et al.  Metataxis in Practice: Dependency Syntax for Multilingual Machine Translation , 1989 .

[10]  Yuji Matsumoto MaltParser: A language-independent system for data-driven dependency parsing , 2005 .

[11]  Anne Abeillé,et al.  Treebanks: Building and Using Parsed Corpora , 2003 .

[12]  Igor Boguslavsky,et al.  Dependency Treebank for Russian: Concept, Tools, Types of Information , 2000, COLING.

[13]  Joakim Nivre,et al.  Theory-supporting treebanks , 2003 .

[14]  Igor Mel’čuk,et al.  Dependency Syntax: Theory and Practice , 1987 .

[15]  Stelios Piperidis,et al.  Theoretical and Practical Issues in the Construction of a Greek Dependency Corpus , 2005 .

[16]  Y. Lepage,et al.  An Annotated Corpus in Japanese Using Tesniere’s Structural Syntax , 1998, Workshop On Processing Of Dependency-Based Grammars.