Spanish corpora for sentiment analysis: a survey

Corpora play an important role when training machine learning systems for sentiment analysis. However, Spanish is underrepresented in these corpora, as most primarily include English texts. This paper describes 20 Spanish-language text corpora—collected to support different tasks related to sentiment analysis, ranging from polarity to emotion categorization. We present a brand-new framework for the characterization of corpora. This includes a number of features to help analyze resources at both corpus level and document level. This survey—besides depicting the overall landscape of corpora in Spanish—supports sentiment analysis practitioners with the task of selecting the most suitable resources.

[1]  Tin Wee Tan,et al.  APBioNet—Transforming Bioinformatics in the Asia-Pacific Region , 2013, PLoS Comput. Biol..

[2]  H. Lövheim A new three-dimensional model for emotions and monoamine neurotransmitters. , 2012, Medical hypotheses.

[3]  Maria Salamó,et al.  Análisis de la riqueza léxica en el contexto de la clasificación de atributos demográficos latentes , 2012, Proces. del Leng. Natural.

[4]  Andreas Harth,et al.  SIOC: an approach to connect web-based communities , 2006, Int. J. Web Based Communities.

[5]  Giacomo Berardi,et al.  A Multi-lingual Annotated Dataset for Aspect-Oriented Opinion Mining , 2015, EMNLP.

[6]  Víctor Rodríguez-Doncel,et al.  Digital Representation of Rights for Language Resources , 2015 .

[7]  José Carlos González Cristóbal,et al.  TASS - Workshop on Sentiment Analysis at SEPLN , 2013 .

[8]  Paolo Rosso,et al.  Emotions and Irony per Gender in Facebook , 2014 .

[9]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[10]  José Carlos González,et al.  TASS 2013 - A Second Step in Reputation Analysis in Spanish , 2014, Proces. del Leng. Natural.

[11]  Isa Maks,et al.  Analysis of patient satisfaction in Dutch and Spanish online reviews , 2017, Proces. del Leng. Natural.

[12]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[13]  Maria Salamó,et al.  Clasificación automática del registro lingüístico en textos del español: un análisis contrastivo , 2013, Linguamática.

[14]  Maite Taboada,et al.  Cross-Linguistic Sentiment Analysis: From English to Spanish , 2009, RANLP.

[15]  Andrea Esuli,et al.  Hierarchical Multi-label Conditional Random Fields for Aspect-Oriented Opinion Mining , 2014, ECIR.

[16]  Paolo Rosso,et al.  Mining Subjective Knowledge from Customer Reviews: A Specific Case of Irony Detection , 2011, WASSA@ACL.

[17]  Luis Alfonso Ureña López,et al.  Cross-Domain Sentiment Analysis Using Spanish Opinionated Words , 2014, NLDB.

[18]  P. Shaver,et al.  Emotion knowledge: further exploration of a prototype approach. , 1987 .

[19]  van Gerardus Noord,et al.  Special issue: finite state methods in natural language processing , 2003 .

[20]  Luis Alfonso Ureña López,et al.  Sentiment polarity detection in Spanish reviews combining supervised and unsupervised approaches , 2013, Expert Syst. Appl..

[21]  Henry Anaya-Sánchez,et al.  Retrieving Product Features and Opinions from Customer Reviews , 2013, IEEE Intelligent Systems.

[22]  K. Scherer,et al.  The World of Emotions is not Two-Dimensional , 2007, Psychological science.

[23]  Julio Villena-Román,et al.  TASS 2015 - The Evolution of the Spanish Opinion Mining Systems , 2016, Proces. del Leng. Natural.

[24]  R. Plutchik The Nature of Emotions , 2001 .

[25]  David Vilares Sentiment analysis for reviews and microtexts based on lexico-syntactic knowledge , 2013 .

[26]  Luis Alfonso Ureña López,et al.  Polarity classification for Spanish tweets using the COST corpus , 2015, J. Inf. Sci..

[27]  Haris Papageorgiou,et al.  SemEval-2016 Task 5: Aspect Based Sentiment Analysis , 2016, *SEMEVAL.

[28]  A. Mehrabian Pleasure-arousal-dominance: A general framework for describing and measuring individual differences in Temperament , 1996 .

[29]  Darnes Vilariño Ayala,et al.  Un algoritmo para detectar la polaridad de opiniones en los dominios de laptops y restaurantes , 2016, Res. Comput. Sci..

[30]  Miguel A. Alonso,et al.  A syntactic approach for opinion mining on Spanish reviews , 2013, Natural Language Engineering.

[31]  Víctor Rodríguez-Doncel,et al.  Spanish Corpus for Sentiment Analysis Towards Brands , 2017, SPECOM.

[32]  Tom Cochrane,et al.  Eight dimensions for the emotions , 2009 .

[33]  María Teresa Martín-Valdivia,et al.  COPOS: Corpus Of Patient Opinions in Spanish. Application of Sentiment Analysis Techniques , 2016, Proces. del Leng. Natural.

[34]  Luis Alfonso Ureña López,et al.  Opinion Classification Techniques Applied to a Spanish Corpus , 2011, NLDB.

[35]  J. Russell A circumplex model of affect. , 1980 .

[36]  Erik Cambria,et al.  The Hourglass of Emotions , 2011, COST 2102 Training School.

[37]  Salud María Jiménez-Zafra,et al.  La negación en español: análisis y tipología de patrones de negación , 2016 .

[38]  Paolo Rosso,et al.  Author Profiling in Social Media: The Impact of Emotions on Discourse Analysis , 2017, SLSP.

[39]  Luis Alfonso Ureña López,et al.  Relevance of the SFU ReviewSP-NEG corpus annotated with the scope of negation for supervised polarity classification in Spanish , 2018, Inf. Process. Manag..

[40]  Julio Gonzalo,et al.  Overview of RepLab 2013: Evaluating Online Reputation Monitoring Systems , 2013, CLEF.

[41]  Patricio Martínez-Barco,et al.  Using EmotiBlog to annotate and analyse subjectivity in the new textual genres , 2012, Data Mining and Knowledge Discovery.

[42]  P. Ekman,et al.  Emotion in the Human Face: Guidelines for Research and an Integration of Findings , 1972 .

[43]  Kalina Bontcheva,et al.  Getting More Out of Biomedical Documents with GATE's Full Lifecycle Open Source Text Analytics , 2013, PLoS Comput. Biol..

[44]  Fermín L. Cruz,et al.  Clasificación de documentos basada en la opinión: experimentos con un corpus de críticas de cine en español , 2008, Proces. del Leng. Natural.

[45]  Lluís Padró,et al.  FreeLing 1.3: Syntactic and semantic services in an open-source NLP library , 2006, LREC.

[46]  Julio Villena-Román,et al.  TASS 2014 - The Challenge of Aspect-based Sentiment Analysis , 2015, Proces. del Leng. Natural.

[47]  Víctor Rodríguez-Doncel,et al.  MAS: A Corpus of Tweets for Marketing in Spanish , 2018, ESWC.

[48]  J. Fernando Sánchez-Rada,et al.  A Linked Data Approach to Sentiment and Emotion Analysis of Twitter in the Financial Domain , 2014, WaSABi-FEOSW@ESWC.

[49]  Martin Hepp,et al.  GoodRelations: An Ontology for Describing Products and Services Offers on the Web , 2008, EKAW.