Sentiment Analysis of Investor Opinions on Twitter

The rapid growth of social networks has produced an unprecedented amount of user-generated data, which provides an excellent opportunity for text mining. Sentiment analysis, an important part of text mining, attempts to learn about the authors’ opinion on a text through its content and structure. Such information is particularly valuable for determining the overall opinion of a large number of people. Examples of the usefulness of this are predicting box office sales or stock prices. One of the most accessible sources of user-generated data is Twitter, which makes the majority of its user data freely available through its data access API. In this study we seek to predict a sentiment value for stock related tweets on Twitter, and demonstrate a correlation between this sentiment and the movement of a company’s stock price in a real time streaming environment. Both n-gram and “word2vec” textual representation techniques are used alongside a random forest classification algorithm to predict the sentiment of tweets. These values are then evaluated for correlation between stock prices and Twitter sentiment for that each company. There are significant correlations between price and sentiment for several individual companies. Some companies such as Microsoft and Walmart show strong positive correlation, while others such as Goldman Sachs and Cisco Systems show strong negative correlation. This suggests that consumer facing companies are affected differently than other companies. Overall this appears to be a promising field for future research.

[1]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[2]  Claire Cardie,et al.  Context-aware Learning for Sentence-level Sentiment Analysis with Posterior Regularization , 2014, ACL.

[3]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[4]  Ming Zhou,et al.  Coooolll: A Deep Learning System for Twitter Sentiment Classification , 2014, *SEMEVAL.

[5]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[6]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[7]  Xiang Zhang,et al.  Text Understanding from Scratch , 2015, ArXiv.

[8]  Marc'Aurelio Ranzato,et al.  Ensemble of Generative and Discriminative Techniques for Sentiment Analysis of Movie Reviews , 2014, ICLR.

[9]  Qing Li,et al.  Exploiting Social Relations and Sentiment for Stock Prediction , 2014, EMNLP.

[10]  Wing-Keung Wong,et al.  How rewarding is technical analysis? Evidence from Singapore stock market , 2003 .

[11]  B. Lev,et al.  Fundamental Information Analysis , 1993 .

[12]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[13]  W. B. Cavnar,et al.  N-gram-based text categorization , 1994 .

[14]  Junlan Feng,et al.  Robust Sentiment Detection on Twitter from Biased and Noisy Data , 2010, COLING.

[15]  Andrew B. Whinston,et al.  Whose and what chatter matters? The effect of tweets on movie sales , 2013, Decis. Support Syst..