In this paper, we present the first results of the parallel Czech discourse annotation in the Prague Dependency Treebank 2.0. Having established an annotation scenario for capturing semantic relations crossing the sentence boundary in a discourse, and having annotated the first sections of the treebank according to these guidelines, we report now on the results of the first evaluation of these manual annotations. We give an overview of the process of the annotation itself, which we believe is to a large degree language-independent and therefore accessible to any discourse researcher. Next, we describe the inter-annotator agreement measurement, and, most importantly, we classify and analyze the most common types of annotators disagreement and propose solutions for the next phase of the annotation. The annotation is carried out on dependency trees (on the tectogrammatical layer), this approach is quite novel and it brings us some advantages when interpreting the syntactic structure of the discourse units.
[1]
Jan Hajic,et al.
The Prague Dependency Treebank
,
2003
.
[2]
Nicholas Asher,et al.
Reference to abstract objects in discourse
,
1993,
Studies in linguistics and philosophy.
[3]
Khalid Choukri,et al.
The european language resources association
,
1998,
LREC.
[4]
Livio Robaldo,et al.
The Penn Discourse Treebank 2.0 Annotation Manual
,
2007
.
[5]
Livio Robaldo,et al.
The Penn Discourse TreeBank 2.0.
,
2008,
LREC.
[6]
Jacob Cohen.
A Coefficient of Agreement for Nominal Scales
,
1960
.
[7]
Eva Hajicová,et al.
From Sentence to Discourse: Building an Annotation Scheme for Discourse Based on Prague Dependency Treebank
,
2008,
LREC.