Sentiment analysis of Twitter data within big data distributed environment for stock prediction

This paper covers design, implementation and evaluation of a system that may be used to predict future stock prices basing on analysis of data from social media services. The authors took advantage of large datasets available from Twitter micro blogging platform and widely available stock market records. Data was collected during three months and processed for further analysis. Machine learning was employed to conduct sentiment classification of data coming from social networks in order to estimate future stock prices. Calculations were performed in distributed environment according to Map Reduce programming model. Evaluation and discussion of results of predictions for different time intervals and input datasets proved efficiency of chosen approach is discussed here.

[1]  Zhi Da,et al.  In Search of Attention , 2009 .

[2]  Asta Bäck,et al.  Social Media Roadmaps: Exploring the futures triggered by social media , 2008 .

[3]  Lidia Jackowska-Strumillo,et al.  The influence of using fractal analysis in hybrid MLP model for short-term forecast of close prices on Warsaw Stock Exchange , 2014, 2014 Federated Conference on Computer Science and Information Systems.

[4]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[5]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[6]  J. Stock,et al.  A Comparison of Direct and Iterated Multistep Ar Methods for Forecasting Macroeconomic Time Series , 2005 .

[7]  J. Manyika Big data: The next frontier for innovation, competition, and productivity , 2011 .

[8]  Bernardo A. Huberman,et al.  Predicting the Future with Social Media , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[9]  Paul Zikopoulos,et al.  Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data , 2011 .

[10]  Nassim Nicholas Taleb Common Errors in the Interpretation of the Ideas of The Black Swan and Associated Papers , 2009 .

[11]  Young-Woo Seo,et al.  Text Classification for Intelligent Portfolio Management , 2002 .

[12]  Mary Zajicek Web 2.0: hype or happiness? , 2007, W4A '07.

[13]  Aristides Gionis,et al.  Correlating financial time series with micro-blogging activity , 2012, WSDM '12.

[14]  Ingoo Han,et al.  Genetic algorithms approach to feature discretization in artificial neural networks for the prediction of stock price index , 2000 .