A Paralleled Big Data Algorithm with MapReduce Framework for Mining Twitter Data

Some recent studies have suggested that public opinions expressed in social media may be correlated with various social issues. To find out what actually can be discovered in social media data, we need data mining. Data mining approaches that can handle massive amount of data have recently been referred to as big data algorithms. In this paper, we propose a big data algorithm to handling Twitter data mining. Furthermore, to ensure scalability, MapReduce framework is adopted to parallelize the proposed algorithm. Through the experiments, the potential of the proposed algorithm can be demonstrated. Computationally, the speed of execution can be shown to increase significantly despite increases in data set size. In fact, the acceleration ratio increases as the size of the dataset increases, and as the number of Data Nodes increases.

[1]  P. Greenwood,et al.  A Guide to Chi-Squared Testing , 1996 .

[2]  Wenji Mao,et al.  Social Computing: From Social Informatics to Social Intelligence , 2007, IEEE Intell. Syst..

[3]  Keith C. C. Chan,et al.  A fast big data collection system using MapReduce framework , 2014, 2014 IEEE 3rd International Conference on Cloud Computing and Intelligence Systems.

[4]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[5]  Jiawei Han,et al.  Mining Trajectory Data and Geotagged Data in Social Media for Road Map Inference , 2015, Trans. GIS.

[6]  Mieczyslaw M. Kokar,et al.  Metrics For Monitoring A Social-Political Blogosphere: A Malaysian Case Study , 2010, IEEE Internet Computing.

[7]  A. Kaplan,et al.  Users of the world, unite! The challenges and opportunities of Social Media , 2010 .

[8]  Noah A. Smith,et al.  Movie Reviews and Revenues: An Experiment in Text Regression , 2010, NAACL.

[9]  Andrew K. C. Wong,et al.  Learning sequential patterns for probabilistic inductive prediction , 1994 .

[10]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[11]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[12]  Xifeng Yan,et al.  Network mining and analysis for social applications , 2014, KDD.

[13]  Li Bing,et al.  A Fuzzy Logic Approach for Opinion Mining on Large Scale Twitter Data , 2014, 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing.