This manual is supposed to introduce into the practice of syntactic tagging in the framework of the Prague Dependency Treebank (henceforth PDT). After a brief Introduction, a list of used symbols is given (Sect. 1) followed by a description of the automatic procedure dealing with grammatemes (Sect. 2.1), and by instructions covering further transducing (non-automatic, for the time being) of morphemic and analytic data to the tectogrammatical level. Section 2.2.1 concerns morphological grammatemes, and the subsequent sections (2.2.2-2.6) represent what is supposed to be of maximal importance for the majority of annotators: the parts dealing with functors and syntactic grammatemes. In the concluding Section 3 the topic-focus articulation is treated. Čermák and in cooperation with other research institutions) is conceived as a three-layer system of tags (Hajič, Hajičová, Rosen 1996): the individual layers can be characterized as follows: (i) morphemic tagging capturing relatively disambiguated values of morphemic categories; let us note that also a result of a full morphemic analysis is available, i.e., complete sets of values of individual forms without disambiguation: e.g., the form dobrým gets "I.SG or D.PL", yet for the tag just one of the two possibilities is chosen according to the given context; (ii) syntactic tags at the so-called analytic level, capturing the functions of individual word forms as they are expressed in the surface shape of the sentence; in the analytic tree structures (ATSs), every word token and punctuation mark has a corresponding node and is analyzed as for its POS and morphemic value, as well as for the main syntactic functions ('analytic functors', 'Afuns'); among the values of Afun, Subj, Obj, Adv are not classified in a more subtle way; (iii) syntactic tags at the tectogrammatical level (TGTSs) rendering the deep (underlying, tectogrammatical) structure of the sentence, i.e., its syntactic structure proper (with a detailed classification of functors, see below).
[1]
Eva Hajicová,et al.
An Automatic Procedure for Topic-Focus Identification
,
1995,
Comput. Linguistics.
[2]
Vladimír Petkevic.
A NEW DEPENDENCY BASED SPECIFICATION OF UNDERLYING REPRESENTATIONS OF SENTENCES
,
1988,
COLING.
[3]
Jan Hajic,et al.
Probabilistic and Rule-Based Tagger of an Inflective Language- a Comparison
,
1997,
ANLP.
[4]
Petr Sgall,et al.
Aktuální členění věty v češtině
,
1980
.
[5]
ová,et al.
Dependency Treebank : From analytic to tectogrammatical annotations
,
2000
.
[6]
Petr Sgall,et al.
Čas a modalita v češtině
,
1971
.
[7]
Petr Sgall,et al.
Language resources need annotations to make them really reusable: the Prague dependency tree bank
,
1998,
LREC.
[8]
Jarmila Panevová,et al.
Formy a funkce ve stavbě české věty
,
1980
.
[9]
P. Sgall,et al.
Generativní popis jazyka a česká deklinace
,
1967
.
[10]
Jarmila Panevová,et al.
More Remarks on Control
,
1996
.
[11]
P. Sgall,et al.
Topic-focus articulation, tripartite structures, and semantic content
,
1998
.
[12]
Jarmila Panevová,et al.
Surface And Deep Cases
,
1992,
COLING.
[13]
Petr Sgall,et al.
The Meaning Of The Sentence In Its Semantic And Pragmatic Aspects
,
1986
.
[14]
Jan Hajic,et al.
Czech language processing, POS tagging
,
1998,
LREC.
[15]
Eva Hajičová,et al.
Issues of Sentence Structure and Discourse Patterns.
,
1993
.