The Prague Dependency Treebank: Crossing the Sentence Boundary

The units processed by tagging procedures both automatic and manual are sentences (as occurring in the texts in the corpus), but the human annotators are instructed to assign (disambiguated) structures according to the meaning of the sentence in its environment, taking contextual (and factual) information into account. We focus in the paper on two issues: how to capture (i) the topic-focus articulation as one of the fundamental properties of sentence structure, which is related to the use of the sentence in a broader context, be it a suprasentential or a situational one, and (ii) the coreferential links in the text.