SVM based approach for opinion classification in Arabic written tweets

We propose a machine learning approach for automatically classifying opinions of Twitter texts written in Modern Standard Arabic (MSA). Tweets are classified as either positive, negative, neutral or non-opinion. Various features for opinion classification have been used which are mainly linguistic and numeric. Our in-house collected and developed training data consists of tweets preserving their specifications such as @usermentions, #hashtags which are used as tweet-particular features. Four machine learning algorithms were applied on our dataset: Support Vector Machine (SVM), Naive Bayes (NB), J48 decision tree and Random forest. The experiments results show that SVM gives the highest F measure (72%), while the j48 classifier gives the highest precision (70,97%). Our experimental results demonstrate that tweet's specific features can significantly improve classification performance in comparison to other features combination.

[1]  Patrick Haffner,et al.  Support vector machines for histogram-based image classification , 1999, IEEE Trans. Neural Networks.

[2]  R. M. Duwairi,et al.  Sentiment Analysis in Arabic tweets , 2014, 2014 5th International Conference on Information and Communication Systems (ICICS).

[3]  Kareem Darwish,et al.  Subjectivity and Sentiment Analysis of Modern Standard Arabic and Arabic Microblogs , 2013, WASSA@NAACL-HLT.

[4]  Janyce Wiebe,et al.  Just How Mad Are You? Finding Strong and Weak Opinion Clauses , 2004, AAAI.

[5]  Luis Alfonso Ureña López,et al.  Bilingual Experiments with an Arabic-English Corpus for Opinion Mining , 2011, RANLP.

[6]  Dan Jurafsky,et al.  Automatic Extraction of Opinion Propositions and their Holders , 2004 .

[7]  Ellen Riloff,et al.  Finding Mutual Benefit between Subjectivity Analysis and Information Extraction , 2011, IEEE Transactions on Affective Computing.

[8]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[9]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[10]  Muhammad Abdul-Mageed,et al.  AWATIF: A Multi-Genre Corpus for Modern Standard Arabic Subjectivity and Sentiment Analysis , 2012, LREC.

[11]  Erik Marcadé,et al.  Mining on Social Networks , 2011 .

[12]  Matthieu Vernier,et al.  Catégorisation des évaluations dans un corpus de blogs multi-domaine , 2009, Fouille de Données d'Opinions.

[13]  Qiang Ye,et al.  Sentiment classification of online reviews to travel destinations by supervised machine learning approaches , 2009, Expert Syst. Appl..

[14]  Hsinchun Chen,et al.  Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums , 2008, TOIS.

[15]  Verena Rieser,et al.  An Arabic Twitter Corpus for Subjectivity and Sentiment Analysis , 2014, LREC.

[16]  Roxana Girju,et al.  YADAC: Yet another Dialectal Arabic Corpus , 2012, LREC.

[17]  Junlan Feng,et al.  Robust Sentiment Detection on Twitter from Biased and Noisy Data , 2010, COLING.

[18]  Walid Magdy,et al.  Detecting Comments on News Articles in Microblogs , 2013, ICWSM.

[19]  Muhammad Abdul-Mageed,et al.  Subjectivity and Sentiment Analysis of Modern Standard Arabic , 2011, ACL.

[20]  Hong Yu,et al.  Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences , 2003, EMNLP.

[21]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[22]  Christopher D. Manning,et al.  Better Arabic Parsing: Baselines, Evaluations, and Analysis , 2010, COLING.

[23]  Vasileios Hatzivassiloglou,et al.  Predicting the Semantic Orientation of Adjectives , 1997, ACL.