Sentiment analysis using semantic similarity and Hadoop MapReduce

Sentiment analysis or opinion mining is a domain that analyses people’s opinions, sentiments, evaluations, attitudes, and emotions from a written language; it had become a very active area of scientific research in recent years, especially with the development of social networks like Facebook and Twitter. In this paper we propose two new approaches to classify the tweets (look for the feeling expressed in the tweet), the first according to three classes : negative, positive or neutral, and the second according to two classes : negative or positive. Our first method consists in calculating the semantic similarity between the tweet to classify and three documents where each document represents a class (contains the words that represent a class); after the calculation of the similarity, the tweet takes the class of the document that has the greatest value of the semantic similarity with it. And the second method consists in calculating the semantic similarity between each word of the tweet to classify and the words “positive” and “negative” by proposing a new formula. We decide to do the analysis in a parallel and distributed way, using the Hadoop framework with the Hadoop distributed file system (HDFS) and the programming model MapReduce to solve the problem of the calculation time of the analysis if the dataset of the tweets is very large. The aim of our work is to combine between several domains, the information retrieval, semantic similarity, opinion mining or sentiment analysis and big data.

[1]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[2]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[3]  Gui Xiaolin,et al.  Comparison Research on Text Pre-processing Methods on Twitter Sentiment Analysis , 2017, IEEE Access.

[4]  Yong Shi,et al.  The Role of Text Pre-processing in Sentiment Analysis , 2013, ITQM.

[5]  Owen Rambow,et al.  Sentiment Analysis of Twitter Data , 2011 .

[6]  Samir Tartir,et al.  Semantic Sentiment Analysis in Arabic Social Media , 2017, J. King Saud Univ. Comput. Inf. Sci..

[7]  Lijuan Wang,et al.  The Role of Pre-processing in Twitter Sentiment Analysis , 2014, ICIC.

[8]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[9]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[10]  Graeme Hirst,et al.  Lexical chains as representations of context for the detection and correction of malapropisms , 1995 .

[11]  Harith Alani,et al.  Semantic Sentiment Analysis of Twitter , 2012, SEMWEB.

[12]  Madani Youness,et al.  A parallel semantic sentiment analysis , 2017, 2017 3rd International Conference of Cloud Computing Technologies and Applications (CloudTech).

[13]  Mustafa Mat Deris,et al.  Effect of negation in sentiment analysis , 2016, 2016 Sixth International Conference on Innovative Computing Technology (INTECH).

[14]  Harith Alani,et al.  Semantic Patterns for Sentiment Analysis of Twitter , 2014, SEMWEB.

[15]  Hamido Fujita,et al.  A hybrid approach to the sentiment analysis problem at the sentence level , 2016, Knowl. Based Syst..

[16]  Martin Chodorow,et al.  Combining local context and wordnet similarity for word sense identification , 1998 .

[17]  ChiclanaFrancisco,et al.  A hybrid approach to the sentiment analysis problem at the sentence level , 2016 .

[18]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[19]  Serkan Ayvaz,et al.  The Effects of Emoji in Sentiment Analysis , 2017 .

[20]  Harith Alani,et al.  On Stopwords, Filtering and Data Sparsity for Sentiment Analysis of Twitter , 2014, LREC.

[21]  Youness Madani,et al.  Social Login and Data Storage in the Big Data File System HDFS , 2017, ICCDA '17.

[22]  Junlan Feng,et al.  Robust Sentiment Detection on Twitter from Biased and Noisy Data , 2010, COLING.

[23]  Walaa Medhat,et al.  Sentiment analysis algorithms and applications: A survey , 2014 .

[24]  Youness Madani,et al.  Adaptive e-learning using Genetic Algorithm and Sentiments Analysis in a Big Data System , 2017 .

[25]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[26]  Ahmad Ali,et al.  Sentiment Analysis on Twitter Data using KNN and SVM , 2017 .

[27]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[28]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.