Sentiment Analysis Approach Based on Combination of Word Embedding Techniques

Sentiment analysis is a field of research that attracts the attention of companies and governments to understand the opinion of the client and citizens, about products, services, policies and more other. With the increased volume of user-generated content on the Web, especially social networks, textual information becomes freely accessible and with a gigantic quantity, which requires powerful automated analysis tools to extract such kind of information (positive or negative sentiment). In this paper, we present sentiment analysis approach depends on pre-trained word embeddings, a frilly high-quality word representation vectors, namely, AraVec and fastText models, and we proposed a combination of the two models, based on vectors concatenation of both models. Sentiment classification was executed employing six different machine learning algorithms, we find that in most of the cases, our proposed method achieves the best results in terms of accuracy, especially with NuSVC classifier which is a type of SVM.

[1]  Nikhil R. Pal,et al.  A novel training scheme for multilayered perceptrons to realize proper generalization and incremental learning , 2003, IEEE Trans. Neural Networks.

[2]  Hadi Veisi,et al.  Sentiment analysis based on improved pre-trained word embeddings , 2019, Expert Syst. Appl..

[3]  Murhaf Fares,et al.  Word vectors, reuse, and replicability: Towards a community repository of large-text resources , 2017, NODALIDA.

[4]  Prakhar Gupta,et al.  Learning Word Vectors for 157 Languages , 2018, LREC.

[5]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[6]  Mahmoud Al-Ayyoub,et al.  Deep Recurrent neural network vs. support vector machine for aspect-based sentiment analysis of Arabic hotels' reviews , 2017, J. Comput. Sci..

[7]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[8]  S. Dumais Latent Semantic Analysis. , 2005 .

[9]  Anthony N. Nguyen,et al.  Analysis of Word Embeddings and Sequence Features for Clinical Information Extraction , 2015, ALTA.

[10]  Raddouane Chiheb,et al.  Sentiment analysis in Arabic: A review of the literature , 2017, Ain Shams Engineering Journal.

[11]  Motaz Saad,et al.  OSAC: Open Source Arabic Corpora , 2010 .

[12]  Mohammed Bennamoun,et al.  How Well Sentence Embeddings Capture Meaning , 2015, ADCS.

[13]  Walaa Medhat,et al.  Corpora Preparation and Stopword List Generation for Arabic data in Social Network , 2014, ArXiv.

[14]  Samhaa R. El-Beltagy,et al.  AraVec: A set of Arabic Word Embedding Models for use in Arabic NLP , 2017, ACLING.

[15]  Muazzam Ahmed Siddiqui,et al.  Pre-trained Word Embeddings for Arabic Aspect-Based Sentiment Analysis of Airline Tweets , 2018, AISI.

[16]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[17]  Kareem Darwish,et al.  Subjectivity and Sentiment Analysis of Modern Standard Arabic and Arabic Microblogs , 2013, WASSA@NAACL-HLT.

[18]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[19]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[20]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[21]  Samir Tartir,et al.  Semantic Sentiment Analysis in Arabic Social Media , 2017, J. King Saud Univ. Comput. Inf. Sci..

[22]  Amir F. Atiya,et al.  ASTD: Arabic Sentiment Tweets Dataset , 2015, EMNLP.

[23]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[24]  Mahmoud Al-Ayyoub,et al.  Arabic sentiment analysis: Lexicon-based and corpus-based , 2013, 2013 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT).