On the usefulness of lexical and syntactic processing in polarity classification of Twitter messages

Millions of micro texts are published every day on Twitter. Identifying the sentiment present in them can be helpful for measuring the frame of mind of the public, their satisfaction with respect to a product, or their support of a social event. In this context, polarity classification is a subfield of sentiment analysis focused on determining whether the content of a text is objective or subjective, and in the latter case, if it conveys a positive or a negative opinion. Most polarity detection techniques tend to take into account individual terms in the text and even some degree of linguistic knowledge, but they do not usually consider syntactic relations between words. This article explores how relating lexical, syntactic, and psychometric information can be helpful to perform polarity classification on Spanish tweets. We provide an evaluation for both shallow and deep linguistic perspectives. Empirical results show an improved performance of syntactic approaches over pure lexical models when using large training sets to create a classifier, but this tendency is reversed when small training collections are used.

[1]  Claire Cardie,et al.  Learning with Compositional Semantics as Structural Inference for Subsentential Sentiment Analysis , 2008, EMNLP.

[2]  Philip Resnik,et al.  More than Words: Syntactic Packaging and Implicit Sentiment , 2009, NAACL.

[3]  Hongchul Lee,et al.  Sentiment analysis of twitter audiences: Measuring the positive or negative influence of popular twitterers , 2012, J. Assoc. Inf. Sci. Technol..

[4]  Wouter Weerkamp,et al.  Microblog language identification: overcoming the limitations of short, unedited and idiomatic text , 2012, Language Resources and Evaluation.

[5]  Ronen Feldman,et al.  Techniques and applications for sentiment analysis , 2013, CACM.

[6]  Xiaojun Wan,et al.  Exploiting syntactic and semantic relationships between terms for opinion retrieval , 2012, J. Assoc. Inf. Sci. Technol..

[7]  Karo Moilanen,et al.  Sentiment Composition , 2007 .

[8]  Yung-Ming Li,et al.  Deriving market intelligence from microblogs , 2013, Decis. Support Syst..

[9]  Daniel Dajun Zeng,et al.  Sentiment analysis of Chinese documents: From sentence to document level , 2009, J. Assoc. Inf. Sci. Technol..

[10]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[11]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[12]  Carolyn Penstein Rosé,et al.  Generalizing Dependency Features for Opinion Mining , 2009, ACL.

[13]  Joakim Nivre,et al.  Algorithms for Deterministic Incremental Dependency Parsing , 2008, CL.

[14]  Lei Zhang,et al.  Sentiment Analysis and Opinion Mining , 2017, Encyclopedia of Machine Learning and Data Mining.

[15]  Ricardo Ribeiro,et al.  Sentiment Analysis and Topic Classification based on Binary Maximum Entropy Classifiers , 2013, Proces. del Leng. Natural.

[16]  Josef van Genabith,et al.  #hardtoparse: POS Tagging and Parsing the Twitterverse , 2011, Analyzing Microtext.

[17]  Héctor Ramiro Campos,et al.  De la oracion simple a la oracion compuesta: Curso superior de gramatica espanola , 1994 .

[18]  Philipp Koehn,et al.  Synthesis Lectures on Human Language Technologies , 2016 .

[19]  Julio Villena Román,et al.  TASS 2013 - Workshop on Sentiment Analysis at SEPLN 2013: An overview , 2013 .

[20]  Yuji Matsumoto MaltParser: A language-independent system for data-driven dependency parsing , 2005 .

[21]  Xuanjing Huang,et al.  Phrase Dependency Parsing for Opinion Mining , 2009, EMNLP.

[22]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[23]  Mike Thelwall,et al.  Sentiment strength detection for the social web , 2012, J. Assoc. Inf. Sci. Technol..

[24]  Luis Alfonso Ureña López,et al.  Improving polarity classification of bilingual parallel corpora combining machine learning and semantic orientation approaches , 2013, J. Assoc. Inf. Sci. Technol..

[25]  Alistair Kennedy,et al.  SENTIMENT CLASSIFICATION of MOVIE REVIEWS USING CONTEXTUAL VALENCE SHIFTERS , 2006, Comput. Intell..

[26]  Mariona Taulé,et al.  AnCora: Multilevel Annotated Corpora for Catalan and Spanish , 2008, LREC.

[27]  Navneet Kaur,et al.  Opinion mining and sentiment analysis , 2016, 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom).

[28]  Vasudeva Varma,et al.  Mining Sentiments from Tweets , 2012, WASSA@ACL.

[29]  Marie-Francine Moens,et al.  A machine learning approach to sentiment analysis in multilingual Web texts , 2009, Information Retrieval.

[30]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[31]  P. Hare Advance Online Publication , 2002, Nature Medicine.

[32]  Maite Taboada,et al.  Cross-Linguistic Sentiment Analysis: From English to Spanish , 2009, RANLP.

[33]  ThelwallMike,et al.  Sentiment strength detection in short informal text , 2010 .

[34]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[35]  M. González Politeness: some universals in language usage , 1995 .

[36]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[37]  Timothy Baldwin,et al.  Lexical Normalisation of Short Text Messages: Makn Sens a #twitter , 2011, ACL.

[38]  Miguel A. Alonso,et al.  Supervised polarity classification of Spanish tweets based on linguistic knowledge , 2013, ACM Symposium on Document Engineering.

[39]  Akshi Kumar,et al.  Sentiment Analysis on Twitter , 2012 .

[40]  Marta R. Costa-jussà,et al.  Automatic normalization of short texts by combining statistical and rule-based techniques , 2013, Lang. Resour. Evaluation.

[41]  Miguel A. Alonso,et al.  A syntactic approach for opinion mining on Spanish reviews , 2013, Natural Language Engineering.

[42]  Brendan T. O'Connor,et al.  Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments , 2010, ACL.

[43]  James W. Pennebaker,et al.  Linguistic Inquiry and Word Count (LIWC2007) , 2007 .

[44]  Uzay Kaymak,et al.  Polarity analysis of texts using discourse structure , 2011, CIKM '11.

[45]  Miguel A. Alonso,et al.  Prototipado Rápido de un Sistema de Normalización de Tuits: Una Aproximación Léxica , 2013, Tweet-Norm@SEPLN.

[46]  Andrew C. Levy,et al.  FORM S-1 REGISTRATION STATEMENT UNDER THE SECURITIES ACT OF 1933 , 2007 .

[47]  José Carlos González Cristóbal,et al.  TASS - Workshop on Sentiment Analysis at SEPLN , 2013 .

[48]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[49]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[50]  Shrikanth S. Narayanan,et al.  A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle , 2012, ACL.

[51]  Carolyn Penstein Rosé,et al.  Sentiment Classification using Automatically Extracted Subgraph Features , 2010, HLT-NAACL 2010.

[52]  Lei Zhang,et al.  Combining lexicon-based and learning-based methods for twitter sentiment analysis , 2011 .

[53]  Mike Thelwall,et al.  Sentiment in Twitter events , 2011, J. Assoc. Inf. Sci. Technol..

[54]  Tiejun Zhao,et al.  Target-dependent Twitter Sentiment Classification , 2011, ACL.

[55]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[56]  Penelope Brown,et al.  Politeness: Some Universals in Language Usage , 1989 .

[57]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[58]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[59]  Luis Alfonso Ureña López,et al.  Random Walk Weighting over SentiWordNet for Sentiment Polarity Detection on Twitter , 2012, WASSA@ACL.

[60]  Kentaro Inui,et al.  Dependency Tree-based Sentiment Classification using CRFs with Hidden Variables , 2010, NAACL.

[61]  D C Washington FORM S-1 REGISTRATION STATEMENT UNDER THE SECURITIES ACT OF 1933 , 2012 .

[62]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[63]  Gulden Uchyigit,et al.  Sentimentor: Sentiment Analysis of Twitter Data , 2012, SDAD@ECML/PKDD.

[64]  Mitsuru Ishizuka,et al.  Assessing Sentiment of Text by Semantic Dependency and Contextual Valence Analysis , 2007, ACII.

[65]  Khurshid Ahmad,et al.  Is there a language of sentiment? An analysis of lexical resources for sentiment analysis , 2013, Language Resources and Evaluation.

[66]  Michael Gamon,et al.  Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis , 2004, COLING.

[67]  Slav Petrov,et al.  Overview of the 2012 Shared Task on Parsing the Web , 2012 .

[68]  Nairán Ramírez-Esparza,et al.  La psicología del uso de las palabras: Un programa de computadora que analiza textos en español , 2007 .

[69]  M. Taboada,et al.  The contribution of nonveridical rhetorical relations to evaluation in discourse , 2012 .

[70]  José Carlos González,et al.  TASS - Workshop on Sentiment Analysis at SEPLN , 2013, Proces. del Leng. Natural.

[71]  Max Kaufmann Syntactic Normalization of Twitter Messages , 2010 .

[72]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[73]  Deyu Zhou,et al.  Self-training from labeled features for sentiment analysis , 2011, Inf. Process. Manag..

[74]  Alexander F. Gelbukh,et al.  Empirical Study of Machine Learning Based Approach for Opinion Mining in Tweets , 2012, MICAI.

[75]  Wiebke Wagner,et al.  Steven Bird, Ewan Klein and Edward Loper: Natural Language Processing with Python, Analyzing Text with the Natural Language Toolkit , 2010, Lang. Resour. Evaluation.

[76]  Owen Rambow,et al.  Sentiment Analysis of Twitter Data , 2011 .

[77]  Mike Thelwall,et al.  Sentiment in short strength detection informal text , 2010 .