The requirements of the depth and precision of annotation vary for different intended uses of the corpus but it has been commonly accepted nowadays that the standard annotations of surface structure are only the first steps in a more ambitious research program, aiming at a creation of advanced resources for most different systems of natural language processing and for testing and further enrichment of linguistic and computational theories. Among the several possible directions in which we believe the standard annotation systems should go (and in some cases already attempt to go) beyond the POS tagging or shallow syntactic annotations, the following four are characterized in the present contribution: (i) predicateargument representation of the underlying syntactic relations as basically corresponding to a rooted tree that can be univocally linearized, (ii) the inclusion of the information structure using very simple means (the left-to-right order of the nodes and three attribute values), (iii) relating this underlying structure (rendering the ”linguistic meaning,” i.e. the semantically relevant counterparts of the grammatical means of expression) to certain central aspects of referential semantics (reference assignment and coreferential relations), and (iv) handling of word sense disambiguation. The first three issues are documented in the present paper on the basis of our experience with the development of the structure and scenario of the Prague Dependency Treebank which provides for syntactico-semantic annotation of large text segments from the Czech National Corpus and which is based on a solid theoretical framework.
[1]
李幼升,et al.
Ph
,
1989
.
[2]
Eva Hajicová,et al.
Argument/Valency Structure in PropBank, LCS Database and Prague Dependency Treebank: A Comparative Pilot Study
,
2002,
LREC.
[3]
Eva Hajičová,et al.
Issues of Sentence Structure and Discourse Patterns.
,
1993
.
[4]
B. Hladká,et al.
The Prague Dependency Treebank: Annotation Structure and Support
,
2022
.
[5]
Christopher R. Johnson,et al.
Background to Framenet
,
2003
.
[6]
Jan Hajic,et al.
Linguistic Annotation : from Links to Cross-Layer Lexicons
,
2003
.
[7]
P. Sgall,et al.
Topic-focus articulation, tripartite structures, and semantic content
,
1998
.
[8]
Petr Pajas,et al.
PDT-VALLEX : Creating a Large-coverage Valency Lexicon for Treebank Annotation
,
2003
.
[9]
Beth Levin,et al.
English Verb Classes and Alternations: A Preliminary Investigation
,
1993
.
[10]
Jan Hajic,et al.
The Prague Dependency Treebank
,
2003
.
[11]
P. Luelsdorff.
The Prague School of Structural and Functional Linguistics
,
1994
.
[12]
Lucien Tesnière.
Éléments de syntaxe structurale
,
1959
.
[13]
Petr Sgall,et al.
The Meaning Of The Sentence In Its Semantic And Pragmatic Aspects
,
1986
.
[14]
Markéta Lopatková,et al.
Valency in the Prague Dependency Treebank: Building the Valency Lexicon
,
2003,
Prague Bull. Math. Linguistics.
[15]
Martha Palmer,et al.
Automatic Predicate Argument Analysis of the Penn TreeBank
,
2001,
HLT.
[16]
Eva Hajičová.
Dependency-based underlying-structure tagging of a very large Czech corpus
,
2000
.