Syntax-Ignorant N-gram Embeddings for Sentiment Analysis of Arabic Dialects

Arabic sentiment analysis models have employed compositional embedding features to represent the Arabic dialectal content. These embeddings are usually composed via ordered, syntax-aware composition functions and learned within deep neural frameworks. With the free word order and the varying syntax nature across the different Arabic dialects, a sentiment analysis system developed for one dialect might not be efficient for the others. Here we present syntax-ignorant n-gram embeddings to be used in sentiment analysis of several Arabic dialects. The proposed embeddings were composed and learned using an unordered composition function and a shallow neural model. Five datasets of different dialects were used to evaluate the produced embeddings in the sentiment analysis task. The obtained results revealed that, our syntax-ignorant embeddings could outperform word2vec model and doc2vec both variant models in addition to hand-crafted system baselines, while a competent performance was noticed towards baseline systems that adopted more complicated neural architectures.

[1]  El-Sayed M. El-Alfy,et al.  Hybrid Deep Learning for Sentiment Polarity Determination of Arabic Microblogs , 2017, ICONIP.

[2]  Misha Denil,et al.  Noisy Activation Functions , 2016, ICML.

[3]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[4]  Hal Daumé,et al.  Deep Unordered Composition Rivals Syntactic Methods for Text Classification , 2015, ACL.

[5]  Kareem Darwish,et al.  Subjectivity and Sentiment Analysis of Modern Standard Arabic and Arabic Microblogs , 2013, WASSA@NAACL-HLT.

[6]  Mohammed Bennamoun,et al.  How Well Sentence Embeddings Capture Meaning , 2015, ADCS.

[7]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[8]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[9]  Fethi Bougares,et al.  Sentiment Analysis of Tunisian Dialects: Linguistic Ressources and Experiments , 2017, WANLP@EACL.

[10]  Lixin Tao,et al.  Word embeddings for Arabic sentiment analysis , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[11]  Hatem Haddad,et al.  Empirical Evaluation of Word Representations on Arabic Sentiment Analysis , 2017, ICALP.

[12]  Pengfei Duan,et al.  Word Embeddings and Convolutional Neural Network for Arabic Sentiment Classification , 2016, COLING.

[13]  Hazem M. Hajj,et al.  Deep Learning Models for Sentiment Analysis in Arabic , 2015, ANLP@ACL.

[14]  Ming Zhou,et al.  Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification , 2014, ACL.

[15]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[16]  K. Brustad The Syntax of Spoken Arabic: A Comparative Study of Moroccan, Egyptian, Syrian, and Kuwaiti Dialects. , 2002 .

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Steven Skiena,et al.  Polyglot: Distributed Word Representations for Multilingual NLP , 2013, CoNLL.

[19]  Amir F. Atiya,et al.  ASTD: Arabic Sentiment Tweets Dataset , 2015, EMNLP.

[20]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[21]  Mohsen Rashwan,et al.  Word Representations in Vector Space and their Applications for Arabic , 2015, CICLing.

[22]  Mirella Lapata,et al.  Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[23]  Mahmoud Al-Ayyoub,et al.  Arabic sentiment analysis: Lexicon-based and corpus-based , 2013, 2013 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT).

[24]  Abdeljalil Elouardighi,et al.  A machine Learning approach for sentiment analysis in the standard or dialectal Arabic Facebook comments , 2017, 2017 3rd International Conference of Cloud Computing Technologies and Applications (CloudTech).

[25]  Samhaa R. El-Beltagy,et al.  NileTMRG at SemEval-2017 Task 4: Arabic Sentiment Analysis , 2017, *SEMEVAL.