Romanian Valence Dictionary in XML Format

Valence dictionaries are dictionaries in which logical predicates (most of the times verbs) are inventoried alongside with the semantic and syntactic information regarding the role of the arguments with which they combine, as well as the syntactic restrictions these arguments have to obey. In this article we present the incipient stage of the project “Syntactic and semantic database in XML format: an HPSG representation of verb valences in Romanian”. Its aim is the development of a valence dictionary in XML format for a set of 3000 Romanian verbs. Valences are specified for each sense of each verb, alongside with an illustrative example, possible argument alternations and a set of multiword expressions in which the respective verb occurs with the respective sense. The grammatical formalism we make use of is Head-driven Phrase Structure Grammar, which offers one of the most comprehensive frames of encoding various types of linguistic information for lexical items. XML is the most appropriate mark-up language for describing information structured in HPSG framework. The project can be further on extended so that to cover all Romanian verbs (around 7000) and also other predicates (nouns, adjectives, prepositions).

[1]  Ivan A. Sag,et al.  Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[2]  Ted Briscoe,et al.  Automatic Extraction of Subcategorization from Corpora , 1997, ANLP.

[3]  Ted Briscoe,et al.  Parser evaluation: a survey and a new proposal , 1998, LREC.

[4]  Ivan A. Sag,et al.  Syntactic Theory: A Formal Introduction , 1999, Computational Linguistics.

[5]  Krasimira Ivanova,et al.  Creating a machine-readable version of Bulgarian valence dictionary ( A case study of CLaRK system application ) 1 , 2002 .

[6]  Christopher D. Manning Automatic Acquisition of a Large Sub Categorization Dictionary From Corpora , 1993, ACL.

[7]  Anna Korhonen,et al.  Improving Subcategorization Acquisition Using Word Sense Disambiguation , 2003, ACL.

[8]  Nikos Fakotakis,et al.  Learning automatic acquisition of subcategorization frames using Bayesian inference and support vector machines , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[9]  Adam Przepiórkowski,et al.  Towards the Design of a Syntactico-Semantic Lexicon for Polish , 2004, Intelligent Information Systems.

[10]  Anoop Sarkar,et al.  Automatic Extraction of Subcategorization Frames for Czech , 2000, COLING.

[11]  Ivan A. Sag,et al.  Information-based syntax and semantics , 1987 .

[12]  Daniel Marcu,et al.  Unsupervised Learning of Verb Argument Structures , 2006, CICLing.

[13]  Anthony R. Davis,et al.  Linking by Types in the Hierarchical Lexicon , 2001 .