Stock Market Prediction from WSJ: Text Mining via Sparse Matrix Factorization

We revisit the problem of predicting directional movements of stock prices based on news articles: here our algorithm uses daily articles from The Wall Street Journal to predict the closing stock prices on the same day. We propose a unified latent space model to characterize the "co-movements" between stock prices and news articles. Unlike many existing approaches, our new model is able to simultaneously leverage the correlations: (a) among stock prices, (b) among news articles, and (c) between stock prices and news articles. Thus, our model is able to make daily predictions on more than 500 stocks (most of which are not even mentioned in any news article) while having low complexity. We carry out extensive back testing on trading strategies based on our algorithm. The result shows that our model has substantially better accuracy rate (55.7%) compared to many widely used algorithms. The return (56%) and Sharpe ratio due to a trading strategy based on our model are also much higher than baseline indices.

[1]  Robert E. Schapire,et al.  Algorithms for portfolio management based on the Newton method , 2006, ICML.

[2]  Yin Zhang,et al.  An Alternating Direction Algorithm for Nonnegative Matrix Factorization , 2010 .

[3]  F. Eugene FAMA, . Market efficiency, long-term returns, and behavioral finance, Journal of Financial Economics . , 1998 .

[4]  Patrick Seemann,et al.  Matrix Factorization Techniques for Recommender Systems , 2014 .

[5]  Allan Borodin,et al.  Can We Learn to Beat the Best Stock , 2003, NIPS.

[6]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[7]  Johan Bollen,et al.  Predicting Financial Markets: Comparing Survey,News, Twitter and Search Engine Data , 2011, ArXiv.

[8]  Chenchuramaiah T. Bathala Giving Content to Investor Sentiment: The Role of Media in the Stock Market , 2007 .

[9]  James D. Thomas Integrating Genetic Algorithms and Text Learning for Financial Prediction , 2000 .

[10]  John V. Guttag,et al.  Learning Connections in Financial Time Series , 2013, ICML.

[11]  R. Tibshirani,et al.  A note on the group lasso and a sparse group lasso , 2010, 1001.0736.

[12]  W. S. Chan,et al.  Stock Price Reaction to News and No-News: Drift and Reversal after Headlines , 2001 .

[13]  T. Cover Universal Portfolios , 1996 .

[14]  Yonina C. Eldar,et al.  C-HiLasso: A Collaborative Hierarchical Sparse Modeling Framework , 2010, IEEE Transactions on Signal Processing.

[15]  Gerhard Knolmayer,et al.  NewsCATS: A News Categorization and Trading System , 2006, Sixth International Conference on Data Mining (ICDM'06).

[16]  Hsinchun Chen,et al.  Textual analysis of stock market prediction using breaking financial news: The AZFin text system , 2009, TOIS.

[17]  Wai Lam,et al.  News Sensitive Stock Trend Prediction , 2002, PAKDD.

[18]  David D. Jensen,et al.  Mining of Concurrent Text and Time Series , 2008 .

[19]  G. Golub,et al.  A Hessenberg-Schur method for the problem AX + XB= C , 1979 .

[20]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[21]  Christoph Schommer,et al.  News and stock markets: A survey on abnormal returns and prediction models , 2012 .

[22]  Jian Zhang,et al.  Daily Prediction of Major Stock Indices from Textual WWW Data , 1998, KDD.

[23]  Andrew M. Dai,et al.  Proceedings of NIPS Workshop on Applications for Topic Models Text and Beyond , 2009 .

[24]  Gabriel Doyle Financial Topic Models , 2009 .

[25]  Diego Garcia,et al.  Journalists and the Stock Market , 2011 .

[26]  Steven Skiena,et al.  Trading Strategies to Exploit Blog and News Sentiment , 2010, ICWSM.

[27]  M. Hagenau,et al.  Automated News Reading: Stock Price Prediction Based on Financial News Using Context-Specific Features , 2012, 2012 45th Hawaii International Conference on System Sciences.

[28]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .