Sentiment Analysis of Arabic Reviews for Saudi Hotels Using Unsupervised Machine Learning

Virtual worlds such as social networking sites, blogs and content communities are extremely becoming one of the most powerful sources for news, markets, industries etc. These virtual worlds can be used for many aspects, because they are rich platforms full of feedback, emotions, thoughts and reviews. The main objective of this paper is to cluster Arabic reviews of Saudi hotels for sentiment analysis into positive and negative clusters. We used web scraping to collect Arabic reviews associated only with Saudi hotels, from the tourism website TripAdvisor and obtained in total 4604 Arabic reviews. Then the TF-IDF was applied to extract relevant features. An unsupervised learning approach was applied, in particular K-means and Hierarchical algorithms with two distance metrics: Cosine and Euclidean. Our manual labelled test data shows that the K-means algorithm with cosine distance performed well when applying all of our prepossessing steps. We concluded that the suggested prepossessing steps play a critical role in Arabic language processing and sentiment analysis.

[1]  Abed Allah Khamaiseh,et al.  A comprehensive survey of arabic sentiment analysis , 2019, Inf. Process. Manag..

[2]  Stergios Chatzikyriakidis,et al.  An Arabic Tweets Sentiment Analysis Dataset (ATSAD) using Distant Supervision and Self Training , 2020, OSACT.

[3]  Mahmoud Al-Ayyoub,et al.  Deep Recurrent neural network vs. support vector machine for aspect-based sentiment analysis of Arabic hotels' reviews , 2017, J. Comput. Sci..

[4]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[5]  Ahmed Sharaf Eldin,et al.  Feature-based sentiment analysis in online Arabic reviews , 2016, 2016 11th International Conference on Computer Engineering & Systems (ICCES).

[6]  Samhaa R. El-Beltagy,et al.  Building Large Arabic Multi-domain Resources for Sentiment Analysis , 2015, CICLing.

[7]  Mustafa Jarrar,et al.  Clustering Arabic Tweets for Sentiment Analysis , 2017, 2017 IEEE/ACS 14th International Conference on Computer Systems and Applications (AICCSA).

[8]  Giovanna Guerrini,et al.  An Overviewof Similarity Measures for Clustering XML Documents , 2007 .

[9]  Abdel-Badeeh M. Salem,et al.  Implementation of Machine Learning Algorithms in Arabic Sentiment Analysis Using N-Gram Features , 2019, Procedia Computer Science.

[10]  Mohamed G. Elfeky,et al.  Mining Arabic Business Reviews , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[11]  Mahmoud Al-Ayyoub,et al.  Enhancing Aspect-Based Sentiment Analysis of Arabic Hotels' reviews using morphological, syntactic and semantic features , 2019, Inf. Process. Manag..

[12]  Awny A. Sayed,et al.  Sentiment Analysis for Arabic Reviews using Machine Learning Classification Algorithms , 2020, 2020 International Conference on Innovative Trends in Communication and Computer Engineering (ITCE).

[13]  Raddouane Chiheb,et al.  Sentiment analysis in Arabic: A review of the literature , 2017, Ain Shams Engineering Journal.

[14]  Hava T. Siegelmann,et al.  Support Vector Clustering , 2002, J. Mach. Learn. Res..

[15]  Anna-Lan Huang,et al.  Similarity Measures for Text Document Clustering , 2008 .

[16]  Fernando Berzal Galiano,et al.  Evaluation Metrics for Unsupervised Learning Algorithms , 2019, ArXiv.

[17]  Sakshi Patel,et al.  A study of hierarchical clustering algorithms , 2015, 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom).

[18]  Nagwa M. El-Makky,et al.  Sentiment Analysis of Arabic Tweets using Deep Learning , 2018, ACLING.

[19]  Vipin Deep Kaur Sentimental Analysis of Book Reviews using Unsupervised Semantic Orientation and Supervised Machine Learning Approaches , 2018, 2018 Second International Conference on Green Computing and Internet of Things (ICGCIoT).

[20]  Hafida Benhidour,et al.  Sentiment Analysis Of English Tweets: A Comparative Study of Supervised and Unsupervised Approaches , 2019, 2019 2nd International Conference on Computer Applications & Information Security (ICCAIS).

[21]  Verena Rieser,et al.  An Arabic Twitter Corpus for Subjectivity and Sentiment Analysis , 2014, LREC.

[22]  Ashraf Elnagar,et al.  An Annotated Huge Dataset for Standard and Colloquial Arabic Reviews for Subjective Sentiment Analysis , 2018, ACLING.

[23]  Xiaobo Zhang,et al.  Hotel reviews sentiment analysis based on word vector clustering , 2017, 2017 2nd IEEE International Conference on Computational Intelligence and Applications (ICCIA).

[24]  Rua Ismail,et al.  Sentiment Analysis for Arabic Dialect Using Supervised Learning , 2018, 2018 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE).

[25]  Huan Liu,et al.  Unsupervised sentiment analysis with emotional signals , 2013, WWW.

[26]  Abdelghani Bakhtouchi,et al.  A new syntax-based aspect detection approach for sentiment analysis in Arabic reviews , 2018, 2018 2nd International Conference on Natural Language and Speech Processing (ICNLSP).

[27]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..