Annotation Lexicons: Using the Valency Lexicon for Tectogrammatical Annotation

We present a formalization of the valency theory (Panevova, 1974) that fits the stratificational representation scheme used in the Prague Dependency Treebank. The notion of a lexicon as a repository of “static” (invariable, or context-independent) source of information is formally presented; a different type of lexicon is used at every layer of sentence representation, with a formal link to this representation (and thus, annotation). In order to show how such a lexicon can be used in the annotation process itself, we describe also an automatic procedure using information from a valency lexicon for partial annotation of a corpus at the tectogrammatical layer. When adding nodes into tectogrammatical representation of sentences, we substantially increase recall at the cost of a small decrease of precision.