Diatheses in the Czech Valency Lexicon PDT-Vallex

An important design element in all lexicons, whether human-oriented or designed for computer processing, is the variability of forms in which lexical units described in the lexicon entries can occur in natural language utterances. If all such forms and variations were to be listed independently in the lexicon, its size would be enormous and it would be hard to maintain (every change would have to be copied to many entries). These problems can even multiply in the case of lexicons for computerized natural language applications, where entries must be explicitly and formally described in full detail. As an inherent part of the Prague Dependency Treebank project ([9]; for its theoretical background, see the work of Sgall et al. [33]) a valency lexicon called PDT-Vallex ([10], [39], [40]) has been created and is publicly available, with over 8800 verb senses and their corresponding valency frames, linked fully to the treebank. When a particular verb sense is used in a diathetic expression (passive construction, reciprocity, resultative or dispositional modality etc.), the surface expression of verb complements also changes ([40]). While the basic form “transformations” are well known, it is less obvious how to describe them for all the modalities, especially for the purposes of computer processing, where everything must be explicitly stated. We have found that these transformations can be described by a set of rules, which then allow to keep only a canonical (i.e., the active-voice) valency frame in the lexicon entry, and use these rules to obtain surface expression constraints for all the diatheses covered. This formalization have been used in the formal checking of the Prague Dependency Treebank project and it is used in other current projects as well.

[1]  František Daneš,et al.  Větné vzorce v češtině , 1987 .

[2]  Petr Sgall,et al.  The Meaning Of The Sentence In Its Semantic And Pragmatic Aspects , 1986 .

[3]  Jarmila Panevová,et al.  Formy a funkce ve stavbě české věty , 1980 .

[4]  Nad̕a Svozilová,et al.  Slovesa pro praxi : valenční slovník nejčastějších českých sloves , 1997 .

[5]  Petr Pajas,et al.  PDT-VALLEX : Creating a Large-coverage Valency Lexicon for Treebank Annotation , 2003 .

[6]  Miroslav Grepl Příruční mluvnice češtiny , 1995 .

[7]  Beth Levin,et al.  English Verb Classes and Alternations: A Preliminary Investigation , 1993 .

[8]  Karel Pala,et al.  Valence českých sloves , 1997 .

[9]  Igor Mel’čuk,et al.  Dependency Syntax: Theory and Practice , 1987 .

[10]  Jan Hajic,et al.  Annotation Lexicons: Using the Valency Lexicon for Tectogrammatical Annotation , 2003, Prague Bull. Math. Linguistics.

[11]  Lucien Tesnière Éléments de syntaxe structurale , 1959 .

[12]  Zdenka Uresova The verbal valency in the Prague Dependency Treebank from the annotator ' s point of view , 2005 .

[13]  Markéta Žabokrtský Zdeněk Kettnerová Václava Lopatková,et al.  Valenční slovník českých sloves. , 2008 .

[14]  Changes in Valency Structure of Verbs: Grammar vs. Lexicon , 2009 .

[15]  Vladimír Kadlec,et al.  Exploitation of the VerbaLex Verb Valency Lexicon in the Syntactic Analysis of Czech , 2006, TSD.

[16]  Gramatické prostředky hierarchizace sémantické struktury věty , 1983 .

[17]  Zdenek Zabokrtský,et al.  Synthesis of Czech Sentences from Tectogrammatical Trees , 2006, TSD.

[18]  Alexis Nasr,et al.  SuperTagging and Full Parsing , 2004, TAG+.

[19]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[20]  L. H. Babby Voice and Diathesis in Slavic , 1998 .

[21]  František Štícha Utváření a hierarchizace struktury větného znaku , 1984 .

[22]  James Pustejovsky,et al.  The Generative Lexicon , 1995, CL.