Dataset Classification Combining Wide-coverage Lexical Resources and Text Features

In this paper we describe our participation in TASS 2017 shared task on polarity classification of Spanish tweets. For this task we built a classification model based on the Lingmotif Spanish lexicon, and combined this with a number of formal text features, both general and CMC-specific, as well as single-word keywords and n-gram keywords, achieving above-average results across all three datasets. We report the results of our experiments with different combinations of said feature sets and machine learning algorithms (logistic regression and SVM).

[1]  Saif Mohammad,et al.  WASSA-2017 Shared Task on Emotion Intensity , 2017, WASSA@EMNLP.

[2]  Sabine Bergler,et al.  Mining WordNet for a Fuzzy Sentiment: Sentiment Tag Extraction from WordNet Glosses , 2006, EACL.

[3]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[4]  Soo-Min Kim,et al.  Determining the Sentiment of Opinions , 2004, COLING.

[5]  Saeed Abdullah,et al.  Collective Smile: Measuring Societal Happiness from Geolocated Images , 2015, CSCW.

[6]  Alistair Kennedy,et al.  SENTIMENT CLASSIFICATION of MOVIE REVIEWS USING CONTEXTUAL VALENCE SHIFTERS , 2006, Comput. Intell..

[7]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[8]  Preslav Nakov,et al.  SemEval-2013 Task 2: Sentiment Analysis in Twitter , 2013, *SEMEVAL.

[9]  Antonio Moreno Ortiz Lingmotif: A User-focused Sentiment Analysis Tool , 2017, Proces. del Leng. Natural.

[10]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[11]  José Carlos González Cristóbal,et al.  TASS - Workshop on Sentiment Analysis at SEPLN , 2013 .

[12]  Philip J. Stone,et al.  A computer approach to content analysis: studies using the General Inquirer system , 1963, AFIPS Spring Joint Computing Conference.

[13]  Kathleen R. McKeown,et al.  Predicting the semantic orientation of adjectives , 1997 .

[14]  Sabine Bergler,et al.  CLaC and CLaC-NB: Knowledge-based and corpus-based approaches to sentiment tagging , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[15]  Jhon Adrián Cerón-Guzmán JACERONG at TASS 2016: An Ensemble Classifier for Sentiment Analysis of Spanish Tweets at Global Level , 2016, TASS@SEPLN.

[16]  Antonio Moreno Ortiz Lingmotif: Sentiment Analysis for the Digital Humanities , 2017, EACL.

[17]  Annie Zaenen,et al.  Contextual Valence Shifters , 2006, Computing Attitude and Affect in Text.

[18]  Sung-Hyon Myaeng,et al.  Domain-specific sentiment analysis using contextual feature generation , 2009, TSA@CIKM.

[19]  Saif Mohammad,et al.  NRC-Canada-2014: Detecting Aspects and Sentiment in Customer Reviews , 2014, *SEMEVAL.

[20]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[21]  Siddharth Patwardhan,et al.  Feature Subsumption for Opinion Analysis , 2006, EMNLP.

[22]  Claire Cardie,et al.  Learning with Compositional Semantics as Structural Inference for Subsentential Sentiment Analysis , 2008, EMNLP.

[23]  Vincent Ng,et al.  Automatic Keyphrase Extraction: A Survey of the State of the Art , 2014, ACL.

[24]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[25]  Saif Mohammad,et al.  Emotion Intensities in Tweets , 2017, *SEMEVAL.

[26]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[27]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[28]  Michael Gamon,et al.  Customizing Sentiment Classifiers to New Domains: a Case Study , 2019 .

[29]  Yue Lu,et al.  Automatic construction of a context-aware sentiment lexicon: an optimization approach , 2011, WWW.

[30]  Hung-Yu Kao,et al.  Automatic Domain-Specific Sentiment Lexicon Generation with Label Propagation , 2013, IIWAS '13.

[31]  Saif Mohammad,et al.  NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets , 2013, *SEMEVAL.