A Semi-supervised Corpus Annotation for Saudi Sentiment Analysis Using Twitter

In the literature, limited work has been conducted to develop sentiment resources for Saudi dialect. The lack of resources such as dialectical lexicons and corpora are some of the major bottlenecks to the successful development of Arabic sentiment analysis models. In this paper, a semi-supervised approach is presented to construct an annotated sentiment corpus for Saudi dialect using Twitter. The presented approach is primarily based on a list of lexicons built by using word embedding techniques such as word2vec. A huge corpus extracted from twitter is annotated and manually reviewed to exclude incorrect annotated tweets which is publicly available. For corpus validation, state-of-the-art classification algorithms (such as Logistic Regression, Support Vector Machine, and Naive Bayes) are applied and evaluated. Simulation results demonstrate that the Naive Bayes algorithm outperformed all other approaches and achieved accuracy up to 91%.

[1]  Tariq S. Durrani,et al.  Toward's Arabic Multi-modal Sentiment Analysis , 2017, CSPS.

[2]  Aqil M. Azmi,et al.  Arabic tweets sentiment analysis – a hybrid scheme , 2016, J. Inf. Sci..

[3]  Muhammad Abdul-Mageed,et al.  AWATIF: A Multi-Genre Corpus for Modern Standard Arabic Subjectivity and Sentiment Analysis , 2012, LREC.

[4]  Ahmed Emam,et al.  Towards enhancement of a lexicon-based approach for Saudi dialect sentiment analysis , 2018, J. Inf. Sci..

[5]  Ahmed Rafea,et al.  A Hybrid Approach for Sentiment Classification of Egyptian Dialect Tweets , 2015, 2015 First International Conference on Arabic Computational Linguistics (ACLing).

[6]  Hend Suliman Al-Khalifa,et al.  AraSenTi: Large-Scale Twitter-Specific Arabic Sentiment Lexicons , 2016, ACL.

[7]  Luis Alfonso Ureña López,et al.  OCA: Opinion corpus for Arabic , 2011, J. Assoc. Inf. Sci. Technol..

[8]  N. Omar,et al.  A Hybrid method using Lexicon-based Approach and Naive Bayes Classifier for Arabic Opinion Question Answering , 2014, J. Comput. Sci..

[9]  Muhammad Abdul-Mageed,et al.  SANA: A Large Scale Multi-Genre, Multi-Dialect Lexicon for Arabic Subjectivity and Sentiment Analysis , 2014, LREC.

[10]  Samhaa R. El-Beltagy,et al.  AraVec: A set of Arabic Word Embedding Models for use in Arabic NLP , 2017, ACLING.

[11]  Owen Rambow,et al.  SLSA: A Sentiment Lexicon for Standard Arabic , 2015, EMNLP.

[12]  Fatiha Sadat,et al.  Automatic identification of arabic dialects in social media , 2014, SoMeRA@SIGIR.

[13]  Imene Guellil,et al.  Social big data mining: A survey focused on opinion mining and sentiments analysis , 2015, 2015 12th International Symposium on Programming and Systems (ISPS).

[14]  Hanady Mansour,et al.  Combining Sentiment Lexicons of Arabic Terms , 2017, AMCIS.

[15]  Amir F. Atiya,et al.  ASTD: Arabic Sentiment Tweets Dataset , 2015, EMNLP.

[16]  Nizar Habash,et al.  A Large Scale Arabic Sentiment Lexicon for Arabic Opinion Mining , 2014, ANLP@EMNLP.

[17]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[18]  S. R. El-Beltagy,et al.  Open issues in the sentiment analysis of Arabic social media: A case study , 2013, 2013 9th International Conference on Innovations in Information Technology (IIT).