The impact of natural language preprocessing on big data sentiment analysis

The sentiment analysis determines peoples’ opinions, sentiments and emotions by classifying their written text into positive or negative polarity. The sentiment analysis is important for many critical applications such as decision making and products evaluation. Social networks are one of the main sources of sentiment analysis. However, the huge volume of data produced by social networks requires efficient and scalable analysis techniques to be applied. The MapReduce proved its efficiency and scalability in handling big data, thus attracted many researchers to use the MapReduce as a processing framework. In this paper, a sentiment analysis method for big data is studied. The method uses the Naïve Bayes algorithm for classifying texts into positive and negative polarity. Several linguistic and Natural Language Processing (NLP)preprocessing techniques are applied on a Twitter data set, to study their impact on the accuracy of big data classification. The preformed experiments indicates that the accuracy of the sentiment analysis is enhanced by 5%, yielding an accuracy of 73% on the Stanford Sentiment data set.

[1]  Huma Parveen,et al.  Sentiment analysis on Twitter Data-set using Naive Bayes algorithm , 2016, 2016 2nd International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT).

[2]  Madani Youness,et al.  A parallel semantic sentiment analysis , 2017, 2017 3rd International Conference of Cloud Computing Technologies and Applications (CloudTech).

[3]  Wael Etaiwi,et al.  The Impact of applying Different Preprocessing Steps on Review Spam Detection , 2017, EUSPN/ICTH.

[4]  Elisabetta Fersini,et al.  Sentiment Analysis in Social Networks , 2016 .

[5]  Genshe Chen,et al.  Scalable sentiment classification for Big Data analysis using Naïve Bayes Classifier , 2013, 2013 IEEE International Conference on Big Data.

[6]  Youness Madani,et al.  Sentiment analysis using semantic similarity and Hadoop MapReduce , 2018, Knowledge and Information Systems.

[7]  Rajiv Ramnath,et al.  Towards building large-scale distributed systems for twitter sentiment analysis , 2012, SAC '12.

[8]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[9]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[10]  Byoungchul Ahn,et al.  MapReduce Functions to Analyze Sentiment Information from Social Big Data , 2015, Int. J. Distributed Sens. Networks.

[11]  V. Chauhan,et al.  Sentimental Analysis of Social Networks using MapReduce and Big Data Technologies , 2017 .

[12]  Kang Liu,et al.  Book Review: Sentiment Analysis: Mining Opinions, Sentiments, and Emotions by Bing Liu , 2015, CL.

[13]  Fei Song,et al.  Improving sentiment analysis with Part-of-Speech weighting , 2009, 2009 International Conference on Machine Learning and Cybernetics.

[14]  José Francisco Aldana Montes,et al.  A Fine Grain Sentiment Analysis with Semantics in Tweets , 2016, Int. J. Interact. Multim. Artif. Intell..

[15]  Arafat Awajan,et al.  Sentiment Analysis Based on MapReduce: A survey , 2018, IAIT 2018.

[16]  Divya,et al.  Big Data Sentiment Analysis using Hadoop , 2015 .

[17]  Youness Madani,et al.  Adaptive e-learning using Genetic Algorithm and Sentiments Analysis in a Big Data System , 2017 .