Correlating S&P 500 stocks with Twitter data

Twitter is a widely used online social media. One important characteristic of Twitter is its real-time nature. In this paper, we investigate whether the daily number of tweets that mention Standard & Poor 500 (S&P 500) stocks is correlated with S&P 500 stock indicators (stock price and traded volume) at three different levels, from the stock market to industry sector and individual company stocks. We further apply a linear regression with exogenous input model to predict stock market indicators, using Twitter data as exogenous input. Our preliminary results demonstrate that daily number of tweets is correlated with certain stock market indicators at each level. Furthermore, it appears that Twitter is helpful to predict stock market. Specifically, at the stock market level, we find that whether S&P 500 closing price will go up or down can be predicted more accurately when including Twitter data in the model.