SFU ReviewSP-NEG: a Spanish corpus annotated with negation for sentiment analysis. A typology of negation patterns

AbstractIn this paper, we present SFU ReviewSP-NEG, the first Spanish corpus annotated with negation with a wide coverage freely available. We describe the methodology applied in the annotation of the corpus including the tagset, the linguistic criteria and the inter-annotator agreement tests. We also include a complete typology of negation patterns in Spanish. This typology has the advantage that it is easy to express in terms of a tagset for corpus annotation: the types are clearly defined, which avoids ambiguity in the annotation process, and they provide wide coverage (i.e. they resolved all the cases occurring in the corpus). We use the SFU ReviewSP as a base in order to make the annotations. The corpus consists of 400 reviews, 221,866 words and 9455 sentences, out of which 3022 sentences contain at least one negation structure.

[1]  Katrin Erk,et al.  A Powerful and Versatile XML Format for Representing Role-semantic Annotation , 2004, LREC.

[2]  János Csirik,et al.  The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes , 2008, BMC Bioinformatics.

[3]  Maite Taboada,et al.  Methods for Creating Semantic Orientation Dictionaries , 2006, LREC.

[4]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[5]  Roser Morante,et al.  ConanDoyle-neg: Annotation of negation cues and their scope in Conan Doyle stories , 2012, LREC.

[6]  Maite Taboada,et al.  A review corpus annotated for negation, speculation and their scope , 2012, LREC.

[7]  R. Carter,et al.  Cambridge Grammar of English , 2006 .

[8]  Haris Papageorgiou,et al.  SemEval-2016 Task 5: Aspect Based Sentiment Analysis , 2016, *SEMEVAL.

[9]  Kathleen Marchal,et al.  Evaluation of time profile reconstruction from complex two-color microarray designs , 2008, BMC Bioinformatics.

[10]  Paloma Martínez,et al.  The DDI corpus: An annotated corpus with pharmacological substances and drug-drug interactions , 2013, J. Biomed. Informatics.

[11]  David S. Wishart,et al.  DrugBank: a knowledgebase for drugs, drug actions and drug targets , 2007, Nucleic Acids Res..

[12]  Kei-Hoi Cheung,et al.  Advancing translational research with the Semantic Web , 2007, BMC Bioinformatics.

[13]  Natalia Konstantinova,et al.  Annotating Negation and Speculation: the Case of the Review Domain , 2011, RANLP.

[14]  Annie Zaenen,et al.  Contextual Valence Shifters , 2006, Computing Attitude and Affect in Text.

[15]  Jari Björne,et al.  BioInfer: a corpus for information extraction in the biomedical domain , 2007, BMC Bioinformatics.

[16]  Martijn J. Schuemie,et al.  ContextD: an algorithm to identify contextual properties of medical terms in a Dutch clinical corpus , 2014, BMC Bioinformatics.

[17]  Mariona Taulé,et al.  Problematic Cases in the Annotation of Negation in Spanish , 2016 .

[18]  Dan I. Moldovan,et al.  Retrieving implicit positive meaning from negated statements , 2013, Natural Language Engineering.

[19]  Jun'ichi Tsujii,et al.  Corpus annotation for mining biomedical events from literature , 2008, BMC Bioinformatics.

[20]  Thomas E. Payne Describing Morphosyntax: A Guide for Field Linguists , 1997 .

[21]  Veronika Vincze,et al.  Speculation and negation annotation in natural language texts: what the case of BioScope might (not) reveal , 2010, NeSp-NLP@ACL.

[22]  Guodong Zhou,et al.  Research on Chinese negation and speculation: corpus annotation and identification , 2016, Frontiers of Computer Science.

[23]  Ralph Grishman,et al.  Developing a Syntactic Annotation Scheme and Tools for a Spanish Treebank , 2003 .

[24]  Dietrich Klakow,et al.  A survey on the role of negation in sentiment analysis , 2010, NeSp-NLP@ACL.

[25]  Isaac G. Councill,et al.  What's great and what's not: learning to classify the scope of negation for improved sentiment analysis , 2010, NeSp-NLP@ACL.

[26]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[27]  Olivier Sallou,et al.  Community-driven development for computational biology at Sprints, Hackathons and Codefests , 2014, BMC Bioinformatics.

[28]  Roser Morante,et al.  Modality and Negation: An Introduction to the Special Issue , 2012, CL.

[29]  Paloma Martínez,et al.  SemEval-2013 Task 9 : Extraction of Drug-Drug Interactions from Biomedical Texts (DDIExtraction 2013) , 2013, *SEMEVAL.

[30]  Vasile Rus,et al.  DT-Neg: Tutorial Dialogues Annotated for Negation Scope and Focus in Context , 2016, LREC.

[31]  Roser Morante,et al.  SemEval-2010 Task 10: Linking Events and Their Participants in Discourse , 2009, SemEval@ACL.

[32]  Salud María Jiménez-Zafra,et al.  La negación en español: análisis y tipología de patrones de negación , 2016 .