Polarity classification for Spanish tweets using the COST corpus

It was not until 2010 when businesses, politicians and people in general began to realize the potential of Twitter in Spain. This fact has awoken research interest in the extraction of knowledge from Twitter. This paper aims to fill the gap of the lack of resources for Twitter sentiment analysis in Spanish by performing a study of different features and machine learning algorithms for classifying the polarity of Twitter posts. The result is a new corpus of Spanish tweets called COST, and we have carried out a wide-ranging experiment in which different machine learning algorithms have been used. Furthermore, we have tested the influence of using different weighting schemes for unigrams, the influence of eliminating stop-words and the application of a stemmer process.

[1]  Bernard J. Jansen,et al.  Micro-blogging as online word of mouth branding , 2009, CHI Extended Abstracts.

[2]  May,et al.  [Wiley Series in Probability and Statistics] Applied Survival Analysis (Regression Modeling of Time-to-Event Data) || Extensions of the Proportional Hazards Model , 2008 .

[3]  Rudy Prabowo,et al.  Sentiment analysis: A combined approach , 2009, J. Informetrics.

[4]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[5]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[6]  Miles Osborne,et al.  The Edinburgh Twitter Corpus , 2010, HLT-NAACL 2010.

[7]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[8]  Claire Cardie,et al.  39. Opinion mining and sentiment analysis , 2014 .

[9]  Jin Zhang,et al.  An empirical study of sentiment analysis for chinese documents , 2008, Expert Syst. Appl..

[10]  José Carlos González Cristóbal,et al.  TASS - Workshop on Sentiment Analysis at SEPLN , 2013 .

[11]  Nikola Ljubesic,et al.  Towards Sentiment Analysis of Financial Texts in Croatian , 2010, LREC.

[12]  Mourad Oussalah,et al.  A software architecture for Twitter collection, search and geolocation services , 2013, Knowl. Based Syst..

[13]  Alessandro Soro,et al.  Advances in Distributed Agent-Based Retrieval Tools , 2011, Advances in Distributed Agent-Based Retrieval Tools.

[14]  Brendan T. O'Connor,et al.  From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series , 2010, ICWSM.

[15]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[16]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[17]  Miguel A. Alonso,et al.  On the usefulness of lexical and syntactic processing in polarity classification of Twitter messages , 2015, J. Assoc. Inf. Sci. Technol..

[18]  Hsinchun Chen,et al.  Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums , 2008, TOIS.

[19]  Roberto V. Zicari,et al.  PoliTwi: Early detection of emerging political topics on twitter and the impact on concept-level sentiment analysis , 2014, Knowl. Based Syst..

[20]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[21]  Bing Liu,et al.  Sentiment Analysis and Subjectivity , 2010, Handbook of Natural Language Processing.

[22]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[23]  David A. Shamma,et al.  Characterizing debate performance via aggregated twitter sentiment , 2010, CHI.

[24]  Kerstin Denecke,et al.  Using SentiWordNet for multilingual sentiment analysis , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[25]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[26]  Vaibhavi N Patodkar,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2016 .

[27]  Jonathon Read,et al.  Using Emoticons to Reduce Dependency in Machine Learning Techniques for Sentiment Classification , 2005, ACL.

[28]  Khurshid Ahmad,et al.  Multi-lingual Sentiment Analysis of Financial News Streams , 2007 .

[29]  Daniel Dajun Zeng,et al.  Sentiment analysis of Chinese documents: From sentence to document level , 2009, J. Assoc. Inf. Sci. Technol..

[30]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[31]  Björn W. Schuller,et al.  New Avenues in Opinion Mining and Sentiment Analysis , 2013, IEEE Intelligent Systems.

[32]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[33]  JungherrAndreas,et al.  Why the Pirate Party Won the German Election of 2009 or The Trouble With Predictions , 2012 .

[34]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[35]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[36]  Lluís F. Hurtado,et al.  Sentiment Analysis in Twitter for Spanish , 2014, NLDB.

[37]  Luis Alfonso Ureña López,et al.  Sentiment analysis in Twitter , 2012, Natural Language Engineering.

[38]  David Jacot,et al.  Sentiment Analysis of French Movie Reviews , 2011, Advances in Distributed Agent-Based Retrieval Tools.

[39]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[40]  Martin Porter,et al.  Snowball: A language for stemming algorithms , 2001 .

[41]  Patricio Martínez-Barco,et al.  EmotiBlog: a fine-grained model for emotion detection in non-traditional textual genres , 2009 .

[42]  Luis Alfonso Ureña López,et al.  Crowd explicit sentiment analysis , 2014, Knowl. Based Syst..