Trade the tweet: Social media text mining and sparse matrix factorization for stock market prediction

We investigate the potential use of textual information from user-generated microblogs to predict the stock market. Utilizing the latent space model proposed by Wong et al. (2014), we correlate the movements of both stock prices and social media content. This study differs from models in prior studies in two significant ways: (1) it leverages market information contained in high-volume social media data rather than news articles and (2) it does not evaluate sentiment. We test this model on data spanning from 2011 to 2015 on a majority of stocks listed in the S&P 500 Index and find that our model outperforms a baseline regression. We conclude by providing a trading strategy that produces an attractive annual return and Sharpe ratio.

[1]  Zhenming Liu,et al.  Stock Market Prediction from WSJ: Text Mining via Sparse Matrix Factorization , 2014, 2014 IEEE International Conference on Data Mining.

[2]  Efthimios Tambouris,et al.  Understanding the Predictive Power of Social Media This is a pre-print version of the following article : , 2013 .

[3]  Werner Antweiler,et al.  Is All that Talk Just Noise? The Information Content of Internet Stock Message Boards , 2001 .

[4]  Tim Loughran,et al.  When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10-Ks , 2010 .

[5]  Ying Wah Teh,et al.  Text mining for market prediction: A systematic review , 2014, Expert Syst. Appl..

[6]  Hsinchun Chen,et al.  Evaluating sentiment in financial news articles , 2012, Decis. Support Syst..

[7]  Matthew Gentzkow,et al.  Code and Data for the Social Sciences: A Practitioner's Guide , 2014 .

[8]  Jim Kyung-Soo Liew,et al.  Tweet Sentiments and Crowd-Sourced EarningsEstimates as Valuable Sources of Information aroundEarnings Releases , 2016, The Journal of Alternative Investments.

[9]  J. Liew,et al.  Twitter Sentiment and IPO Performance: A Cross-Sectional Examination , 2016, The Journal of Portfolio Management.

[10]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[11]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[12]  Andrew W. Lo,et al.  The Wisdom of Twitter Crowds: Predicting Stock Market Reactions to FOMC Meetings via Twitter Feeds , 2016, The Journal of Portfolio Management.

[13]  Yin Zhang,et al.  An Alternating Direction Algorithm for Nonnegative Matrix Factorization , 2010 .

[14]  Sofus A. Macskassy,et al.  More than Words: Quantifying Language to Measure Firms' Fundamentals the Authors Are Grateful for Assiduous Research Assistance from Jie Cao and Shuming Liu. We Appreciate Helpful Comments From , 2007 .

[15]  Paul C. Tetlock Giving Content to Investor Sentiment: The Role of Media in the Stock Market , 2005, The Journal of Finance.

[16]  Tamás Budavári,et al.  The “Sixth” Factor—A Social Media Factor Derived Directly from Tweet Sentiments , 2017, The Journal of Portfolio Management.

[17]  E. Fama EFFICIENT CAPITAL MARKETS: A REVIEW OF THEORY AND EMPIRICAL WORK* , 1970 .