The SenSem Corpus: an annotated corpus for Spanish and Catalan with information about aspectuality, modality, polarity and factuality

Abstract In this paper, we present the annotation scheme used in the SenSem corpora (SSC), for Spanish and Catalan, to codify information regarding aspectuality, modality, polarity and factuality. As regards aspectuality, the most relevant contribution is the codification of information about dynamicity, telicity and iterativity. Regarding factuality, we present a more fine-grained annotation of uncertainty as applied to the identification of impossible events, completely uncertain events and neutral uncertain events. Although information about factuality in Spanish has been provided elsewhere, the Catalan SSC is the only corpus to do so for Catalan.

[1]  Vázquez,et al.  Guidelines for the syntactico-semantic annotation of a corpus in Spanish , 2005 .

[2]  James Pustejovsky,et al.  FactBank: a corpus annotated with event factuality , 2009, Lang. Resour. Evaluation.

[3]  Christine D. Piatko,et al.  A Modality Lexicon and its use in Automatic Tagging , 2010, LREC.

[4]  James Pustejovsky,et al.  Annotating and Recognizing Event Modality in Text , 2006, FLAIRS.

[5]  Aoife Ahern,et al.  El subjuntivo : contextos y efectos , 2008 .

[6]  Iris Hendrickx,et al.  Modality in Text: a Proposal for Corpus Annotation , 2012, LREC.

[7]  Dina Wonsever,et al.  SIBILA :Esquema de anotación de eventos , 2008 .

[8]  Roser Morante,et al.  Overview of the QA4MRE Pilot Task: Annotating Modality and Negation for a Machine Reading Evaluation , 2011, CLEF.

[9]  Barcelona Media Annotating Temporal Relations i n Catalan and Spanish , 2010 .

[10]  Lluís Padró,et al.  FreeLing 1.3: Syntactic and semantic services in an open-source NLP library , 2006, LREC.

[11]  János Csirik,et al.  The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes , 2008, BMC Bioinformatics.

[12]  Laura Alonso Alemany,et al.  The SenSem project: Syntactico-semantic annotation of sentences in Spanish , 2007 .

[13]  Ana María Fernández Montraveta,et al.  SenSemCat: Corpus de la lengua catalana anotado con infromación morfológica, sintáctica y semántica , 2013 .

[14]  Zeno Vendler,et al.  Verbs and Times , 1957, The Language of Time - A Reader.