CENTEMENT at SemEval-2018 Task 1: Classification of Tweets using Multiple Thresholds with Self-correction and Weighted Conditional Probabilities

In this paper we present our contribution to SemEval-2018, a classifier for classifying multi-label emotions of Arabic and English tweets. We attempted “Affect in Tweets”, specifically Task E-c: Detecting Emotions (multi-label classification). Our method is based on preprocessing the tweets and creating word vectors combined with a self correction step to remove noise. We also make use of emotion specific thresholds. The final submission was selected upon the best performance achieved, selected when using a range of thresholds. Our system was evaluated on the Arabic and English datasets provided for the task by the competition organisers, where it ranked 2nd for the Arabic dataset (out of 14 entries) and 12th for the English dataset (out of 35 entries).

[1]  Allan Ramsay,et al.  Unsupervised Stemmer for Arabic Tweets , 2016, NUT@COLING.

[2]  Vaibhavi N Patodkar,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2016 .

[3]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[4]  Allan Ramsay,et al.  POS Tagging for Arabic Tweets , 2015, RANLP.

[5]  Elke A. Rundensteiner,et al.  EMOTEX: Detecting Emotions in Twitter Messages , 2014 .

[6]  Owen Rambow,et al.  Sentiment Analysis of Twitter Data , 2011 .

[7]  Saif Mohammad,et al.  SemEval-2018 Task 1: Affect in Tweets , 2018, *SEMEVAL.

[8]  Saif Mohammad,et al.  Emotion Intensities in Tweets , 2017, *SEMEVAL.

[9]  Junlan Feng,et al.  Robust Sentiment Detection on Twitter from Biased and Noisy Data , 2010, COLING.

[10]  Petra Kralj Novak,et al.  Sentiment of Emojis , 2015, PloS one.

[11]  Stewart Massie,et al.  Lexicon based feature extraction for emotion text classification , 2017, Pattern Recognit. Lett..

[12]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[13]  Abeed Sarker,et al.  HLP$@$UPenn at SemEval-2017 Task 4A: A simple, self-optimizing text classification system combining dense and sparse vectors , 2017, SemEval@ACL.

[14]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[15]  Allan Ramsay,et al.  Linking Tweets to News: Is All News of Interest? , 2016, AIMSA.

[16]  Preslav Nakov,et al.  SemEval-2016 Task 4: Sentiment Analysis in Twitter , 2016, *SEMEVAL.