Tw-StAR at SemEval-2018 Task 1: Preprocessing Impact on Multi-label Emotion Classification

In this paper, we describe our contribution in SemEval-2018 contest. We tackled task 1 “Affect in Tweets”, subtask E-c “Detecting Emotions (multi-label classification)”. A multilabel classification system Tw-StAR was developed to recognize the emotions embedded in Arabic, English and Spanish tweets. To handle the multi-label classification problem via traditional classifiers, we employed the binary relevance transformation strategy while a TF-IDF scheme was used to generate the tweets’ features. We investigated using single and combinations of several preprocessing tasks to further improve the performance. The results showed that specific combinations of preprocessing tasks could significantly improve the evaluation measures. This has been later emphasized by the official results as our system ranked 3rd for both Arabic and Spanish datasets and 14th for the English dataset.

[1]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[2]  Yong Shi,et al.  The Role of Text Pre-processing in Sentiment Analysis , 2013, ITQM.

[3]  Juan José del Coz,et al.  Binary relevance efficacy for multilabel classification , 2012, Progress in Artificial Intelligence.

[4]  Abdel Hamid Kreaa,et al.  Arabic Words Stemming Approach Using Arabic Wordnet , 2014 .

[5]  Jun Li,et al.  Multi-label maximum entropy model for social emotion classification over short text , 2016, Neurocomputing.

[6]  Helmut Schmid,et al.  Improvements in Part-of-Speech Tagging with an Application to German , 1999 .

[7]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[8]  Adil Yaseen Taha,et al.  Binary relevance (BR) method classifier of multi-label classification for arabic text , 2016 .

[9]  Gülşen Eryiğit,et al.  The Impact of NLP on Turkish Sentiment Analysis , 2014 .

[10]  Samhaa R. El-Beltagy,et al.  NileTMRG at SemEval-2017 Task 4: Arabic Sentiment Analysis , 2017, *SEMEVAL.

[11]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[12]  Madhavi Devaraj,et al.  Analytical mapping of opinion mining and sentiment analysis research during 2000-2015 , 2017, Inf. Process. Manag..

[13]  Saif Mohammad,et al.  SemEval-2018 Task 1: Affect in Tweets , 2018, *SEMEVAL.

[14]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[15]  Nadir Durrani,et al.  Farasa: A Fast and Furious Segmenter for Arabic , 2016, NAACL.

[16]  Everton Alvares Cherman,et al.  Multi-label Problem Transformation Methods: a Case Study , 2011, CLEI Electron. J..

[17]  Maeve Duggan,et al.  Social Media Update 2016 , 2016 .

[18]  Leah S. Larkey,et al.  Arabic Information Retrieval at UMass in TREC-10 , 2001, TREC.

[19]  Saso Dzeroski,et al.  An extensive experimental comparison of methods for multi-label learning , 2012, Pattern Recognit..

[20]  Abdulaziz Alali,et al.  A Novel Stacking Method for Multi-label Classification , 2016 .

[21]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[22]  Kazem Taghva,et al.  Arabic stemming without a root dictionary , 2005, International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume II.