Three Syntactic Formalisms for Data-Driven Dependency Parsing of Croatian

A new syntactic formalism for dependency parsing of Croatian and its implementation in the SETimes Dependency Treebank of Croatian – the Setimes.Hr Treebank – is presented. Its new syntactic tagset is targeted towards improving dependency parsing accuracy, with special emphasis on the main syntactic categories such as predicates, subjects and objects. It is compared with two versions of Croatian Dependency Treebank (HOBS): one with explicit encoding of subordinate syntactic conjunctions and one without. Manual annotation quality and dependency parsing accuracy were inspected. An improvement in inter-annotator agreement was observed, as Cohen’s kappa coefficient for label attachment κ(LA) peaked at 0.92, topping the two HOBS instances by 0.036 and 0.081 points. Overall dependency parsing accuracy reached 77.49 in labeled attachment (LAS), 2.99 and 5.78 points over HOBS, using a standard graph-based dependency parser.