Analyzing Social Media through Big Data using InfoSphere BigInsights and Apache Flume

Abstract Social Media provides organizations ability to survey feelings towards the contents and events associated to them in real time. Moreover, the first demarche of the sentiment analysis is the pre-processing of data collected from Social Media. Most of existing research works that deals with social media analysis based on extracting new features related to sentiment. This paper presents the usage of Twitter in a number of proposed subjects, which is the largest social networking website where Twitter data is in increasing at higher rates every day that considers it as Big Data Source. Then, describing in detail the way in which Big data technology, such as, InfoSphere BigInsights enables processing of this data, which are primarily collected from social networks by Apache Flume and stored in Hadoop storage. In addition, we have investigated a Big Data platform for collecting social media data based on Apache Flume and analyzing this data using InfoSphere BigInsights. Moreover, our paper integrates the visualization of these analysis results using BigSheets. To that end, evaluation through analysis of results confirms that the proposed Big Data platform produces better results in terms of social media analysis.

[1]  Giner Alor-Hernández,et al.  A general perspective of Big Data: applications, tools, challenges and trends , 2015, The Journal of Supercomputing.

[2]  Andrey Balmin,et al.  Jaql , 2011, Proc. VLDB Endow..

[3]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[4]  N. S. Rajput,et al.  Computational scalability with Apache Flume and Mahout for large scale round the clock analysis of sensor network data , 2015, 2015 National Conference on Recent Advances in Electronics & Computer Engineering (RAECE).

[5]  Soo-Min Kim,et al.  Determining the Sentiment of Opinions , 2004, COLING.

[6]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[7]  Suad Alhojely,et al.  Sentiment Analysis and Opinion Mining: A Survey , 2016 .

[8]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[9]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[10]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[11]  Xuanjing Huang,et al.  Structural Opinion Mining for Graph-based Sentiment Representation , 2011, EMNLP.

[12]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[13]  Giovanni Pilato,et al.  Automatic Unsupervised Polarity Detection on a Twitter Data Stream , 2014, 2014 IEEE International Conference on Semantic Computing.

[14]  Eric K. Ringger,et al.  Pulse: Mining Customer Opinions from Free Text , 2005, IDA.

[15]  Rui Li,et al.  TEDAS: A Twitter-based Event Detection and Analysis System , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[16]  Thomas H. Davenport,et al.  Big Data at Work: Dispelling the Myths, Uncovering the Opportunities , 2014 .

[17]  Aleksandar Jevremović,et al.  Twitter Data Analytics in Education Using IBM Infosphere Biginsights , 2016 .

[18]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[19]  Byoungchul Ahn,et al.  MapReduce Functions to Analyze Sentiment Information from Social Big Data , 2015, Int. J. Distributed Sens. Networks.