Pre-trained Word Embeddings for Arabic Aspect-Based Sentiment Analysis of Airline Tweets

Recently, the use of word embeddings has become one of the most significant advancements in natural language processing (NLP). In this paper, we compared two word embedding models for aspect-based sentiment analysis (ABSA) of Arabic tweets. The ABSA problem was formulated as a two step process of aspect detection followed by sentiment polarity classification of the detected aspects. The compared embeddings models include fastText Arabic Wikipedia and AraVec-Web, both available as pre-trained models. Our corpus consisted of 5K airline service related tweets in Arabic, manually labeled for ABSA with imbalanced aspect categories. For classification, we used a support vector machine classifier for both, aspect detection, and sentiment polarity classification. Our results indicated that fastText Arabic Wikipedia word embeddings performed slightly better than AraVec-Web.

[1]  Rim Faiz,et al.  A Machine Learning Approach For Classifying Sentiments in Arabic tweets , 2016, WIMS.

[2]  El-Sayed M. El-Alfy,et al.  Using Word Embedding and Ensemble Learning for Highly Imbalanced Data Sentiment Analysis in Short Arabic Text , 2017, ANT/SEIT.

[3]  James R. Glass,et al.  A Vector Space Approach for Aspect Based Sentiment Analysis , 2015, VS@HLT-NAACL.

[4]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[5]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[6]  Kim Schouten,et al.  Survey on Aspect-Level Sentiment Analysis , 2016, IEEE Transactions on Knowledge and Data Engineering.

[7]  Kareem Darwish,et al.  Subjectivity and Sentiment Analysis of Modern Standard Arabic and Arabic Microblogs , 2013, WASSA@NAACL-HLT.

[8]  Shimei Pan,et al.  Learning Domain-Specific Word Embeddings from Sparse Cybersecurity Texts , 2017, ArXiv.

[9]  Hanady Mansour,et al.  Successes and challenges of Arabic sentiment analysis research: a literature review , 2017, Social Network Analysis and Mining.

[10]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[11]  Samhaa R. El-Beltagy,et al.  AraVec: A set of Arabic Word Embedding Models for use in Arabic NLP , 2017, ACLING.

[12]  Lixin Tao,et al.  Word embeddings for Arabic sentiment analysis , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[13]  Dong-Hong Ji,et al.  A topic-enhanced word embedding for Twitter sentiment classification , 2016, Inf. Sci..