Twitter Sentiment Detection via Ensemble Classification Using Averaged Confidence Scores

We reproduce three classification approaches with diverse feature sets for the task of classifying the sentiment expressed in a given tweet as either positive, neutral, or negative. The reproduced approaches are also combined in an ensemble, averaging the individual classifiers’ confidence scores for the three classes and deciding sentiment polarity based on these averages. Our experimental evaluation on SemEval data shows our re-implementations to slightly outperform their respective originals. Moreover, in the SemEval Twitter sentiment detection tasks of 2013 and 2014, the ensemble of reproduced approaches would have been ranked in the top-5 among 50 participants. An error analysis shows that the ensemble classifier makes few severe misclassifications, such as identifying a positive sentiment in a negative tweet or vice versa. Instead, it tends to misclassify tweets as neutral that are not, which can be viewed as the safest option.

[1]  Stefan Evert,et al.  KLUE: Simple and robust methods for polarity classification , 2013, *SEMEVAL.

[2]  Richard Maclin,et al.  Ensembles as a Sequence of Classifiers , 1997, IJCAI.

[3]  Gianluca Demartini,et al.  ARES: A Retrieval Engine Based on Sentiments - Sentiment-Based Search Result Annotation and Diversification , 2011, ECIR.

[4]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[5]  Liana Ermakova,et al.  Sentiment Classification Based on Phonetic Characteristics , 2013, ECIR.

[6]  Tobias Günther,et al.  GU-MLT-LT: Sentiment Analysis of Short Messages using Linguistic Features and Stochastic Gradient Descent , 2013, *SEMEVAL.

[7]  Preslav Nakov,et al.  SemEval-2014 Task 9: Sentiment Analysis in Twitter , 2014, *SEMEVAL.

[8]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[9]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[10]  Pablo Gervás,et al.  A Joint Model of Feature Mining and Sentiment Analysis for Product Review Rating , 2011, ECIR.

[11]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[12]  Junlan Feng,et al.  Robust Sentiment Detection on Twitter from Biased and Noisy Data , 2010, COLING.

[13]  Saif Mohammad,et al.  CROWDSOURCING A WORD–EMOTION ASSOCIATION LEXICON , 2013, Comput. Intell..

[14]  Ronen Feldman,et al.  Techniques and applications for sentiment analysis , 2013, CACM.

[15]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[16]  Johan Bollen,et al.  Modeling Public Mood and Emotion: Twitter Sentiment and Socio-Economic Phenomena , 2009, ICWSM.

[17]  Ming Zhou,et al.  Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification , 2014, ACL.

[18]  Alexandra Balahur,et al.  Improving Sentiment Analysis in Twitter Using Multilingual Machine Translated Data , 2013, RANLP.

[19]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[20]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[21]  Jeffrey Xu Yu,et al.  A Balanced Ensemble Approach to Weighting Classifiers for Text Classification , 2006, Sixth International Conference on Data Mining (ICDM'06).

[22]  Fredrik Olsson,et al.  Usefulness of Sentiment Analysis , 2012, ECIR.

[23]  Nicholas Diakopoulos,et al.  Cooooooooooooooollllllllllllll!!!!!!!!!!!!!! Using Word Lengthening to Detect Sentiment in Microblogs , 2011, EMNLP.

[24]  Peter D. Turney,et al.  Emotions Evoked by Common Words and Phrases: Using Mechanical Turk to Create an Emotion Lexicon , 2010, HLT-NAACL 2010.

[25]  Finn Årup Nielsen,et al.  A New ANEW: Evaluation of a Word List for Sentiment Analysis in Microblogs , 2011, #MSM.

[26]  Albert Bifet,et al.  Sentiment Knowledge Discovery in Twitter Streaming Data , 2010, Discovery Science.

[27]  Yulan He,et al.  Latent Sentiment Model for Weakly-Supervised Cross-Lingual Sentiment Classification , 2011, ECIR.

[28]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[29]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[30]  Brendan T. O'Connor,et al.  Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters , 2013, NAACL.

[31]  David A. Shamma,et al.  Characterizing debate performance via aggregated twitter sentiment , 2010, CHI.

[32]  Preslav Nakov,et al.  SemEval-2013 Task 2: Sentiment Analysis in Twitter , 2013, *SEMEVAL.

[33]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[34]  Johanna D. Moore,et al.  Twitter Sentiment Analysis: The Good the Bad and the OMG! , 2011, ICWSM.

[35]  Tiejun Zhao,et al.  Target-dependent Twitter Sentiment Classification , 2011, ACL.

[36]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[37]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[38]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[39]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[40]  Franciska de Jong,et al.  Sentiment Analysis and the Impact of Employee Satisfaction on Firm Earnings , 2014, ECIR.

[41]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[42]  Lior Rokach,et al.  Ensemble methods for multi-label classification , 2013, Expert Syst. Appl..

[43]  Saif Mohammad,et al.  NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets , 2013, *SEMEVAL.