DeustoTech Internet at TASS 2015: Sentiment Analysis and Polarity Classification in Spanish Tweets

This article describes our system presented at the workshop for sentiment analysis TASS 2015. Our system approaches the task 1 of the workshop, which consists on performing an automatic sentiment analysis to determine the global polarity of a set of tweets in Spanish. To do this, our system is based on a model supervised Linear Support Vector Machines combined with some polarity lexicons. The inuence of the dierent linguistic features and the dierent sizes of n-grams in improving algorithm performance. Also the results obtained, the various tests that have been conducted, and a discussion of the results are presented.

[1]  Sune Lehmann,et al.  Understanding the Demographics of Twitter Users , 2011, ICWSM.

[2]  Kathy McKeown,et al.  Columbia NLP: Sentiment Detection of Sentences and Subjective Phrases in Social Media , 2014, *SEMEVAL.

[3]  Jaime Redondo,et al.  The Spanish adaptation of ANEW (Affective Norms for English Words) , 2007, Behavior research methods.

[4]  Maite Taboada,et al.  Cross-Linguistic Sentiment Analysis: From English to Spanish , 2009, RANLP.

[5]  Swapna Somasundaran,et al.  Recognizing Stances in Online Debates , 2009, ACL.

[6]  Junlan Feng,et al.  Robust Sentiment Detection on Twitter from Biased and Noisy Data , 2010, COLING.

[7]  Gérard Dray,et al.  Is a Voting Approach Accurate for Opinion Mining? , 2008, DaWaK.

[8]  José Carlos González,et al.  TASS 2013 - A Second Step in Reputation Analysis in Spanish , 2014, Proces. del Leng. Natural.

[9]  ThelwallMike,et al.  Sentiment strength detection in short informal text , 2010 .

[10]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[11]  Ari Rappoport,et al.  Enhanced Sentiment Learning Using Twitter Hashtags and Smileys , 2010, COLING.

[12]  Jeffrey Pennington,et al.  Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[13]  Antonio Moreno Ortiz,et al.  Lexicon-Based Sentiment Analysis of Twitter Messages in Spanish , 2013, Proces. del Leng. Natural.

[14]  Cynthia Whissell,et al.  THE DICTIONARY OF AFFECT IN LANGUAGE , 1989 .

[15]  Saif Mohammad,et al.  NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets , 2013, *SEMEVAL.

[16]  Timothy Baldwin,et al.  Lexical Normalisation of Short Text Messages: Makn Sens a #twitter , 2011, ACL.

[17]  Ming Zhou,et al.  Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification , 2014, ACL.

[18]  A. Smeaton,et al.  On Using Twitter to Monitor Political Sentiment and Predict Election Results , 2011 .

[19]  Junehwa Song,et al.  Contrasting Opposing Views of News Articles on Contentious Issues , 2011, ACL.

[20]  M. Bradley,et al.  Affective Norms for English Words (ANEW): Instruction Manual and Affective Ratings , 1999 .

[21]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[22]  Akshi Kumar,et al.  Sentiment Analysis on Twitter , 2012 .

[23]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[24]  Agustín Gravano,et al.  Spanish DAL: A Spanish Dictionary of Affect in Language , 2013, WASSA@NAACL-HLT.

[25]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[26]  Miguel A. Alonso,et al.  A linguistic approach for determining the topics of Spanish Twitter messages , 2015, J. Inf. Sci..

[27]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[28]  Navneet Kaur,et al.  Opinion mining and sentiment analysis , 2016, 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom).

[29]  Jun Hu,et al.  Detecting and characterizing social spam campaigns , 2010, CCS '10.

[30]  Jiun-Hung Chen,et al.  A multi-label classification based approach for sentiment classification , 2015, Expert Syst. Appl..

[31]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[32]  Yanjun Qi,et al.  Sentiment classification based on supervised latent n-gram analysis , 2011, CIKM '11.

[33]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[34]  Miguel A. Alonso,et al.  A syntactic approach for opinion mining on Spanish reviews , 2013, Natural Language Engineering.

[35]  Xiaolong Wang,et al.  Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach , 2011, CIKM '11.

[36]  Kathy McKeown,et al.  Columbia NLP: Sentiment Detection of Subjective Phrases in Social Media , 2013, *SEMEVAL.

[37]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[38]  Fuad Rahman,et al.  Multiple Classifier Combination for Character Recognition: Revisiting the Majority Voting System and Its Variations , 2002, Document Analysis Systems.

[39]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[40]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[41]  Moshe Koppel,et al.  THE IMPORTANCE OF NEUTRAL EXAMPLES FOR LEARNING SENTIMENT , 2006, Comput. Intell..

[42]  Bernard Zenko,et al.  Is Combining Classifiers Better than Selecting the Best One , 2002, ICML.

[43]  Owen Rambow,et al.  Sentiment Analysis of Twitter Data , 2011 .

[44]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[45]  Christopher M. Danforth,et al.  Temporal Patterns of Happiness and Information in a Global Social Network: Hedonometrics and Twitter , 2011, PloS one.

[46]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[47]  Marcos Garcia,et al.  TASS: A Naive-Bayes strategy for sentiment analysis on Spanish tweets , 2013 .

[48]  Aziz Guergachi,et al.  Sentiment miner: A prototype for sentiment analysis of unstructured data and text , 2014, 2014 IEEE 27th Canadian Conference on Electrical and Computer Engineering (CCECE).

[49]  Julio Villena-Román,et al.  TASS 2014 - The Challenge of Aspect-based Sentiment Analysis , 2015, Proces. del Leng. Natural.

[50]  Lluís F. Hurtado,et al.  Sentiment Analysis in Twitter for Spanish , 2014, NLDB.

[51]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[52]  Preslav Nakov,et al.  SemEval-2014 Task 9: Sentiment Analysis in Twitter , 2014, *SEMEVAL.

[53]  Luis Alfonso Ureña López,et al.  Bilingual Experiments on an Opinion Comparable Corpus , 2013, WASSA@NAACL-HLT.

[54]  James W. Pennebaker,et al.  Linguistic Inquiry and Word Count (LIWC2007) , 2007 .

[55]  Alan F. Smeaton,et al.  Classifying sentiment in microblogs: is brevity an advantage? , 2010, CIKM.

[56]  Nikolaos Aletras,et al.  An analysis of the user occupational class through Twitter content , 2015, ACL.

[57]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[58]  Iñaki San Vicente,et al.  Looking for Features for Supervised Tweet Polarity Classification , 2014 .

[59]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[60]  Claire Cardie,et al.  OpinionFinder: A System for Subjectivity Analysis , 2005, HLT.

[61]  Kazutaka Shimada,et al.  Movie Review Classification Based on a Multiple Classifier , 2007, PACLIC.

[62]  Iñaki San Vicente,et al.  Elhuyar at TASS 2013 , 2013 .

[63]  Andreas Jungherr,et al.  Tweets and votes, a special relationship: the 2009 federal election in germany , 2013, PLEAD '13.

[64]  José Manuel Perea Ortega,et al.  Experiments on feature replacements for polarity classification of Spanish tweets , 2014 .

[65]  Qiang Ye,et al.  Sentiment classification of online reviews to travel destinations by supervised machine learning approaches , 2009, Expert Syst. Appl..

[66]  Eric Deeson,et al.  Online learning , 2005, Br. J. Educ. Technol..

[67]  Xavier Carreras,et al.  FreeLing: An Open-Source Suite of Language Analyzers , 2004, LREC.

[68]  Dietram A. Scheufele,et al.  Framing as a theory of media effects , 1999 .

[69]  Mike Thelwall,et al.  Sentiment strength detection for the social web , 2012, J. Assoc. Inf. Sci. Technol..

[70]  Patricio Martínez-Barco,et al.  Sentiment Analysis of Spanish Tweets Using a Ranking Algorithm and Skipgrams , 2013 .

[71]  Ronen Feldman,et al.  Techniques and applications for sentiment analysis , 2013, CACM.

[72]  Carolyn Penstein Rosé,et al.  Generalizing Dependency Features for Opinion Mining , 2009, ACL.

[73]  Timothy Baldwin,et al.  unimelb: Spanish Text Normalisation , 2013, Tweet-Norm@SEPLN.

[74]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[75]  Alessandro Moschitti,et al.  UNITN: Training Deep Convolutional Neural Network for Twitter Sentiment Classification , 2015, *SEMEVAL.

[76]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[77]  Maria Leonor Pacheco,et al.  of the Association for Computational Linguistics: , 2001 .

[78]  Robert E. Kraut,et al.  Gender, topic, and audience response: an analysis of user-generated content on facebook , 2013, CHI.

[79]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[80]  Jungwoo Kim,et al.  The politics of comments: predicting political orientation of news stories with commenters' sentiment patterns , 2011, CSCW.

[81]  Lluís Padró,et al.  FreeLing 3.0: Towards Wider Multilinguality , 2012, LREC.

[82]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[83]  Eric P. Xing,et al.  Discovering Sociolinguistic Associations with Structured Sparsity , 2011, ACL.

[84]  Thierry Etchegoyhen,et al.  Lexical Normalization of Spanish Tweets with Preprocessing Rules, Domain-specific Edit Distances, and Language Models , 2013, Tweet-Norm@SEPLN.

[85]  Miguel A. Alonso,et al.  LyS at TASS 2014: A Prototype for Extracting and Analysing Aspects from Spanish tweets , 2014 .

[86]  Iñaki San Vicente,et al.  TASS: Detecting Sentiments in Spanish Tweets , 2012 .

[87]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[88]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[89]  Antonio Fernández Anta,et al.  Techniques for Sentiment Analysis and Topic Detection of Spanish Tweets: Preliminary Report , 2012 .

[90]  Iryna Gurevych,et al.  Extracting Opinion Targets in a Single and Cross-Domain Setting with Conditional Random Fields , 2010, EMNLP.

[91]  Miguel A. Alonso,et al.  On the usefulness of lexical and syntactic processing in polarity classification of Twitter messages , 2015, J. Assoc. Inf. Sci. Technol..

[92]  Fadi Biadsy,et al.  Contextual Phrase-Level Polarity Analysis Using Lexical Affect Scoring and Syntactic N-Grams , 2009, EACL.

[93]  Brendan T. O'Connor,et al.  TweetMotif: Exploratory Search and Topic Summarization for Twitter , 2010, ICWSM.

[94]  Julio Villena-Román,et al.  Overview of TASS 2015 , 2015, TASS@SEPLN.

[95]  Carlo Strapparava,et al.  WordNet Affect: an Affective Extension of WordNet , 2004, LREC.

[96]  Eric Horvitz,et al.  Predicting Depression via Social Media , 2013, ICWSM.

[97]  Oren Etzioni,et al.  Extracting Product Features and Opinions from Reviews , 2005, HLT.

[98]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[99]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[100]  Ana-Maria Popescu,et al.  A Machine Learning Approach to Twitter User Classification , 2011, ICWSM.

[101]  Ferran Pla,et al.  ELiRF-UPV en TASS 2014: análisis de sentimientos, detección de tópicos y análisis de sentimientos de aspectos en Twitter , 2014 .

[102]  Wanli Zuo,et al.  SVM based adaptive learning method for text classification from positive and unlabeled documents , 2008, Knowledge and Information Systems.

[103]  Eric Gilbert,et al.  VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text , 2014, ICWSM.

[104]  Mariona Taulé,et al.  AnCora: Multilevel Annotated Corpora for Catalan and Spanish , 2008, LREC.

[105]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[106]  Lars Kai Hansen,et al.  Good Friends, Bad News - Affect and Virality in Twitter , 2011, ArXiv.

[107]  Vibhu O. Mittal,et al.  Comparative Experiments on Sentiment Classification for Online Product Reviews , 2006, AAAI.

[108]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[109]  Lluís F. Hurtado,et al.  Political Tendency Identification in Twitter using Sentiment Analysis Techniques , 2014, COLING.

[110]  Stefan Poslad,et al.  Exploiting hashtags for adaptive microblog crawling , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[111]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[112]  A. Trilla Sentiment Analysis of Twitter messages based on Multinomial Naive Bayes , 2012 .

[113]  Miguel Ángel García Cumbreras,et al.  Participación de SINAI Word2Vec en TASS 2014 , 2014 .