A Micro-Word Based Approach for Arabic Sentiment Analysis

Sentiment analysis of social networks data has recently received a great deal of attention. Social networks are characterized by uncommon language that is different when compared with the standard format of the language. Hence, there is a demand for effective methods to analyze the huge volume of the new word variants that quickly and daily show up in the digital and online world. In text classification, vector space model (VSM) is based on the vocabulary list (i.e. the entire training set words) while ignoring the odd words, which leads to partial loss of textual information. To address this challenge, we propose to use each two-neighboring letters of the word as a basic feature unit instead of using the word itself. That is, instead of using words in VSM, we propose a new method that is based on decomposing each word into a sequence of micro-words, each of which has only two consecutive letters. Two data collections were employed to investigate the performance. The data collections include common (i.e. standard form) and uncommon Arabic text (obtained from Instagram). For the common text, we used a corpus that contains 1,500 documents for training and 500 documents for testing. The proposed method was evaluated using latent semantic indexing (LSI) for textual features and cosine similarity measure for classification. The experimental results show promising results as the proposed method correctly classifies the testing set documents with an accuracy up to 83.6%.

[1]  Bernard J. Jansen,et al.  Twitter power: Tweets as electronic word of mouth , 2009, J. Assoc. Inf. Sci. Technol..

[2]  Verena Rieser,et al.  An Arabic Twitter Corpus for Subjectivity and Sentiment Analysis , 2014, LREC.

[3]  Muhammad Abdul-Mageed,et al.  SAMAR: A System for Subjectivity and Sentiment Analysis of Arabic Social Media , 2012, WASSA@ACL.

[4]  Sergios Theodoridis,et al.  Pattern Recognition, Fourth Edition , 2008 .

[5]  Martin Žnidaršič,et al.  Sentiment analysis on tweets in a financial domain , 2012 .

[6]  Muazzam Ahmed Siddiqui,et al.  Building an Arabic Sentiment Lexicon Using Semi-supervised Learning , 2014, J. King Saud Univ. Comput. Inf. Sci..

[7]  Shrikanth S. Narayanan,et al.  A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle , 2012, ACL.

[8]  Martha W. Evens,et al.  Comparing Words, Stems, and Roots as Index Terms in an Arabic Information Retrieval System , 1994, J. Am. Soc. Inf. Sci..

[9]  Mohamed M. Mostafa,et al.  More than words: Social networks' text mining for consumer brand sentiments , 2013, Expert Syst. Appl..

[10]  Fawaz S. Al-Anzi,et al.  Stemming impact on Arabic text categorization performance: A survey , 2015, 2015 5th International Conference on Information & Communication Technology and Accessibility (ICTA).

[11]  Fawaz S. Al-Anzi,et al.  Toward an enhanced Arabic text classification using cosine similarity and Latent Semantic Indexing , 2017, J. King Saud Univ. Comput. Inf. Sci..

[12]  Alice Oh,et al.  Analysis of Twitter Lists as a Potential Source for Discovering Latent Characteristics of Users , 2010 .