Building a Sentiment Analysis system using automatically generated training Dataset

In this paper, we describe a procedure for extracting annotated Arabic negative and positive tweets. We use these extracted annotated tweets to build our sentiment system using Naive Bayes with TF-IDF enhancement. The large size of training data for a highly inflected language is necessary to compensate for the sparseness nature of such languages. We present our techniques and explain our experimental system. We automatically collect 200 thousand annotated tweets. The evaluation shows that our sentiment analysis system has high precision and accuracy measures compared to existing ones.

[1]  A. Shoukry,et al.  Sentence-level Arabic sentiment analysis , 2012, 2012 International Conference on Collaboration Technologies and Systems (CTS).

[2]  M. Pasquier,et al.  Key issues in conducting sentiment analysis on Arabic social media text , 2013, 2013 9th International Conference on Innovations in Information Technology (IIT).

[3]  Anazida Zainal,et al.  A Review on Challenging Issues in Arabic Sentiment Analysis , 2016, J. Comput. Sci..

[4]  Hend Suliman Al-Khalifa,et al.  Exploring the problems of sentiment analysis in informal Arabic , 2012, IIWAS '12.

[5]  Mahmoud Al-Ayyoub,et al.  Lexicon-based sentiment analysis of Arabic tweets , 2015, Int. J. Soc. Netw. Min..

[6]  Mahmoud Al-Ayyoub,et al.  Arabic sentiment analysis: Lexicon-based and corpus-based , 2013, 2013 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT).

[7]  Mahmoud Al-Ayyoub,et al.  Towards Improving the Lexicon-Based Approach for Arabic Sentiment Analysis , 2014, Int. J. Inf. Technol. Web Eng..

[8]  S. R. El-Beltagy,et al.  Open issues in the sentiment analysis of Arabic social media: A case study , 2013, 2013 9th International Conference on Innovations in Information Technology (IIT).

[9]  Daoud Daoud,et al.  Time-sensitive Arabic multiword expressions extraction from social networks , 2015, International Journal of Speech Technology.

[10]  Preslav Nakov,et al.  SemEval-2016 Task 4: Sentiment Analysis in Twitter , 2016, *SEMEVAL.

[11]  Daoud Daoud,et al.  Arabic Tweets Clustering and Labeling Based on Lingual and Semantically Enriched Bayesian Network Model , 2015 .