Analysis of Streaming Data Using Big Data and Hybrid Machine Learning Approach

A lot of data is generated from multiple sources. This data contains many hidden patterns and information. Data from Social Networks mostly contains opinions. These opinions can be mined to lead various extractions from organizational point of view. In this chapter, the authors are storing the Twitter Streaming Data into HDFS of Hadoop by using Flume and then extracting with Apache Hive. Later, Machine Learning classification algorithms are applied to decode the sentiment in this data. A novel approach based on hybrid Naive Bayes and Decision Tree Algorithms are used to enhance the performance of sentiment analysis of streaming twitter data. Naive Bayes is a powerful and simple classification algorithm. But it assumes independence of features. So, Decision Tree has been used in conjunction with it to get more accurate result. Decision Tree has some rules. Algorithms are combined using Averaging Rule. The implemented research approach achieved an accuracy of 86.44% in comparison to 81.11% for Naive Bayes Classifier.

[1]  Jalpa Mehta,et al.  Sentiment Analysis on Product Reviews using Hadoop , 2016 .

[2]  J. Manyika Big data: The next frontier for innovation, competition, and productivity , 2011 .

[3]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[4]  Craig MacDonald,et al.  Comparing Overall and Targeted Sentiments in Social Media during Crises , 2016, ICWSM.

[5]  U Ravi Babu Sentiment Analysis of Reviews for E-Shopping Websites , 2017 .

[6]  Yiqun Liu,et al.  Emotion Tokens: Bridging the Gap among Multilingual Twitter Sentiment Analysis , 2011, AIRS.

[7]  Steven Skiena,et al.  Large-Scale Sentiment Analysis for News and Blogs (system demonstration) , 2007, ICWSM.

[8]  Akshi Kumar,et al.  Sentiment Analysis on Twitter , 2012 .

[9]  Okran Jeong,et al.  Social media contents based sentiment analysis and prediction system , 2018, Expert Syst. Appl..

[10]  Seref Sagiroglu,et al.  Big data: A review , 2013, 2013 International Conference on Collaboration Technologies and Systems (CTS).

[11]  Rajni Ranjan Singh,et al.  Sentiment Analysis on Social Media and Online Review , 2015 .

[12]  Huma Parveen,et al.  Sentiment analysis on Twitter Data-set using Naive Bayes algorithm , 2016, 2016 2nd International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT).

[13]  Harshawardhan S. Bhosale,et al.  A Review Paper on Big Data and Hadoop , 2014 .

[14]  Mohammad Soleymani,et al.  A survey of multimodal sentiment analysis , 2017, Image Vis. Comput..

[15]  Shunxiang Zhang,et al.  Sentiment analysis of Chinese micro-blog text based on extended sentiment dictionary , 2018, Future Gener. Comput. Syst..

[16]  Warih Maharani,et al.  Microblogging sentiment analysis with lexical based and machine learning approaches , 2013, 2013 International Conference of Information and Communication Technology (ICoICT).

[17]  Ambuj Kumar Agarwal,et al.  Sentiment analysis of big data applications using Twitter Data with the help of HADOOP framework , 2016, 2016 International Conference System Modeling & Advancement in Research Trends (SMART).

[18]  Ashima Singh,et al.  Combining naive bayes and adjective analysis for sentiment detection on Twitter , 2016, 2016 International Conference on Inventive Computation Technologies (ICICT).

[19]  Sukhpal Kaur,et al.  Web News Mining using Back Propagation Neural Network and Clustering using K-Means Algorithm in Big Data , 2016 .

[20]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[21]  K. Asha Rani,et al.  Big Data and Hadoop , 2014 .

[22]  Delip Rao,et al.  Semi-Supervised Polarity Lexicon Induction , 2009, EACL.

[23]  J. Kokila,et al.  Sentiment analysis using big data , 2015, 2015 International Conference on Computation of Power, Energy, Information and Communication (ICCPEIC).

[24]  A. M. Abirami,et al.  A survey on sentiment analysis methods and approach , 2017, 2016 Eighth International Conference on Advanced Computing (ICoAC).

[25]  Mamoon Rashid,et al.  Securing Data Storage By Extending Role-Based Access Control , 2013, Int. J. Cloud Appl. Comput..

[26]  S. Lee,et al.  A domain transferable lexicon set for Twitter sentiment analysis using a supervised machine learning approach , 2018, Expert Syst. Appl..

[27]  Jyoti Gautam,et al.  Real time sentiment analysis of tweets using Naive Bayes , 2016, 2016 2nd International Conference on Next Generation Computing Technologies (NGCT).

[28]  Avinash Chandra Pandey,et al.  Twitter sentiment analysis using hybrid cuckoo search method , 2017, Inf. Process. Manag..

[29]  Estevam R. Hruschka,et al.  Combining Classification and Clustering for Tweet Sentiment Analysis , 2014, 2014 Brazilian Conference on Intelligent Systems.

[30]  Genshe Chen,et al.  Scalable sentiment classification for Big Data analysis using Naïve Bayes Classifier , 2013, 2013 IEEE International Conference on Big Data.

[31]  Soo-Min Kim,et al.  Identifying and Analyzing Judgment Opinions , 2006, NAACL.

[32]  Huy Nguyen,et al.  Twitter Sentiment Analysis Using Machine Learning Techniques , 2020, ICCSAMA.