论文信息 - The Sensem Corpus: a Corpus Annotated at the Syntactic and Semantic Level

The Sensem Corpus: a Corpus Annotated at the Syntactic and Semantic Level

The primary aim of the project SENSEM (Sentence Semantics, BFF2003-06456) is the construction of a Lexical Data Base illustrating the syntactic and semantic behavior of each of the senses of the 250 most frequent verbs of Spanish. With this objective in mind, we are currently building an annotated corpus consisting of sentences extracted from the electronic version of the newspaper El Periodico de Catalunya, totalling approximately 1 million words, with 100 examples of each verb. By the time of the conference, we will be about to complete the annotation of 25,000 sentences, which means roughly a corpus of 800,000 words. Approximately 400,000 of them will have been revised. We expect to make the corpus publicly available by the end of 2006.

Laura Alonso Alemany | Irene Castellón | Ana Fernández Montraveta | Glòria Vázquez | Joan Antoni Capilla

[1] Martha Palmer,et al. Adding predicate argument structure to the Penn TreeBank , 2002 .

[2] Mitchell P. Marcus,et al. Adding Semantic Annotation to the Penn TreeBank , 1998 .

[3] Laura Alonso Alemany,et al. Detección automática de errores en el Corpus SenSem , 2007 .

[4] Martha Palmer,et al. From TreeBank to PropBank , 2002, LREC.

[5] Charles J. Fillmore,et al. The Mechanisms of “Construction Grammar” , 1988 .

[6] José Mª García,et al. Verbs of cognition in Spanish: constructional schemas and reference points , 2004 .

[7] Ana María Fernández Montraveta,et al. La semántica oracional del español: perspectiva desde el léxico , 2005 .

[8] Ana Fernández,et al. SENSEM: base de datos verbal del español 1 , 2003 .

[9] Ana María Fernández Montraveta,et al. Interfaz de explotación del corpus SenSem , 2007 .