Word embeddings for Arabic sentiment analysis

Manual feature extraction is a challenging and time consuming task, especially in a Morphologically Rich Language (MRL) such as Arabic. In this paper, we rely on word embeddings as the main source of features for opinion mining in Arabic text such as tweets, consumer reviews, and news articles. First, we compile a large Arabic corpus from various sources to learn word representations. Second, we train and generate word vectors (embeddings) from the corpus. Third, we use the embeddings in our feature representation for training several binary classifiers to detect subjectivity and sentiment in both Standard Arabic and Dialectal Arabic. We compare our results with other methods in literature; our approach — with no hand-crafted features — achieves a slightly better accuracy than the top hand-crafted methods. To reproduce our results and for further work, we publish the data and code used in our experiments1.

[1]  Yoav Goldberg,et al.  A Primer on Neural Network Models for Natural Language Processing , 2015, J. Artif. Intell. Res..

[2]  Kareem Darwish,et al.  Subjectivity and Sentiment Analysis of Modern Standard Arabic and Arabic Microblogs , 2013, WASSA@NAACL-HLT.

[3]  Christopher D. Manning Computational Linguistics and Deep Learning , 2015, Computational Linguistics.

[4]  Motaz Saad,et al.  OSAC: Open Source Arabic Corpora , 2010 .

[5]  Muhammad Abdul-Mageed,et al.  Subjectivity and Sentiment Analysis of Modern Standard Arabic , 2011, ACL.

[6]  Gilles Louppe,et al.  Independent consultant , 2013 .

[7]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[8]  Muhammad Abdul-Mageed,et al.  AWATIF: A Multi-Genre Corpus for Modern Standard Arabic Subjectivity and Sentiment Analysis , 2012, LREC.

[9]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[10]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[11]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[12]  Christopher D. Manning,et al.  Better Word Representations with Recursive Neural Networks for Morphology , 2013, CoNLL.

[13]  Omer Levy,et al.  Dependency-Based Word Embeddings , 2014, ACL.

[14]  Mahmoud Al-Ayyoub,et al.  Arabic sentiment analysis: Lexicon-based and corpus-based , 2013, 2013 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT).

[15]  Amir F. Atiya,et al.  ASTD: Arabic Sentiment Tweets Dataset , 2015, EMNLP.

[16]  S. R. El-Beltagy,et al.  Open issues in the sentiment analysis of Arabic social media: A case study , 2013, 2013 9th International Conference on Innovations in Information Technology (IIT).

[17]  Rada Mihalcea,et al.  Multilingual Subjectivity: Are More Languages Better? , 2010, COLING.

[18]  Kamel Smaïli,et al.  Evaluation of Topic Identification Methods on Arabic Corpora , 2011, J. Digit. Inf. Manag..

[19]  Muhammad Abdul-Mageed,et al.  SAMAR: A System for Subjectivity and Sentiment Analysis of Arabic Social Media , 2012, WASSA@ACL.

[20]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[21]  Amir F. Atiya,et al.  LABR: A Large Scale Arabic Book Reviews Dataset , 2013, ACL.

[22]  Rim Faiz,et al.  A Machine Learning Approach For Classifying Sentiments in Arabic tweets , 2016, WIMS.

[23]  Georgiana Dinu,et al.  Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[24]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.