Coupling news sentiment with web browsing data predicts intra-day stock prices

The new digital revolution of big data is deeply changing our capability of understanding society and forecasting the outcome of many social and economic systems. Unfortunately, information can be very heterogeneous in the importance, relevance, and surprise it conveys, aecting severely the predictive power of semantic and statistical methods. Here we show that the aggregation of web users’ behavior can be elicited to overcome this problem in a hard to predict complex system, namely the nancial market. Specically, we show that the combined use of sentiment analysis of news and browsing activity of users of Yahoo! Finance allows to forecast intra-day and daily price changes of a set of 100 highly capitalized US stocks traded in the period 2012-2013. Sentiment analysis or browsing activity when taken alone have very small or no predictive power. Conversely, when considering a news signal where in a given time interval we compute the average sentiment of the clicked news, weighted by the number of clicks, we show that for more the 50% of the companies such signal Granger-causes price returns. Our result indicates a \wisdom-of-the-crowd" eect that allows to exploit users’ activity to identify and weigh properly the relevant and surprising news, enhancing considerably the forecasting power of the news sentiment.

[1]  Gene Birz,et al.  The effect of macroeconomic news on stock returns: New evidence from newspaper coverage , 2011 .

[2]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[3]  Richard D. Long Stock Price Reaction to Public and Private Information , 2007 .

[4]  Zhi Da,et al.  In Search of Attention , 2009 .

[5]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[6]  Fabrizio Lillo,et al.  Modelling systemic price cojumps with Hawkes factor models , 2015 .

[7]  M. Tumminello,et al.  How News Affect the Trading Behavior of Different Categories of Investors in a Financial Market , 2012 .

[8]  W. S. Chan,et al.  Stock Price Reaction to News and No-News: Drift and Reversal after Headlines , 2001 .

[9]  G. King,et al.  Ensuring the Data-Rich Future of the Social Sciences , 2011, Science.

[10]  Tim Loughran,et al.  When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10-Ks , 2010 .

[11]  Nicolas Kourtellis,et al.  Stock trade volume prediction with Yahoo Finance user browsing behavior , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[12]  Alessandro Vespignani,et al.  Real-time numerical forecast of global epidemic spreading: case study of 2009 A/H1N1pdm , 2012, BMC Medicine.

[13]  Johan Bollen,et al.  Modeling Public Mood and Emotion: Twitter Sentiment and Socio-Economic Phenomena , 2009, ICWSM.

[14]  Johan Bollen,et al.  Predicting Financial Markets: Comparing Survey,News, Twitter and Search Engine Data , 2011, ArXiv.

[15]  Ako Doffou,et al.  Insider Trading: A Review of Theory and Empirical Work , 2007 .

[16]  Adam V. Reed,et al.  How are Shorts Informed? Short Sellers, News, and Information Processing , 2012 .

[17]  M Tumminello,et al.  A tool for filtering information in complex systems. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[18]  H Eugene Stanley,et al.  Complex dynamics of our economic life on different scales: insights from search engine query data , 2010, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[19]  H. Stanley,et al.  Quantifying Trading Behavior in Financial Markets Using Google Trends , 2013, Scientific Reports.

[20]  Mike Thelwall,et al.  Sentiment in short strength detection informal text , 2010 .

[21]  Ladislav Kristoufek,et al.  Can Google Trends search queries contribute to risk diversification? , 2013, Scientific Reports.

[22]  P. Gloor,et al.  Predicting Stock Market Indicators Through Twitter “I hope it is not as bad as I fear” , 2011 .

[23]  J. Poterba,et al.  What moves stock prices? , 1988 .

[24]  Guido Caldarelli,et al.  Scale-Free Networks , 2007 .

[25]  Guido Caldarelli,et al.  Web Search Queries Can Predict Stock Market Volumes , 2011, PloS one.

[26]  Matthias Bank,et al.  Google search volume and its influence on liquidity and returns of German stocks , 2010 .

[27]  Hsinchun Chen,et al.  Textual analysis of stock market prediction using breaking financial news: The AZFin text system , 2009, TOIS.

[28]  Nikolaus Hautsch,et al.  When machines read the news: Using automated text analytics to quantify high frequency news-implied market reactions , 2011 .

[29]  Guido Caldarelli,et al.  A Multi-Level Geographical Study of Italian Political Elections from Twitter Data , 2014, PloS one.

[30]  Munmun De Choudhury,et al.  Can blog communication dynamics be correlated with stock market activity? , 2008, Hypertext.

[31]  G. Caldarelli,et al.  Networks of equities in financial markets , 2004 .

[32]  C. Granger Investigating causal relations by econometric models and cross-spectral methods , 1969 .

[33]  Xiong Xiong,et al.  Internet information arrival and volatility of SME PRICE INDEX , 2014 .

[34]  Wei Wei,et al.  Correlating S&P 500 stocks with Twitter data , 2012, HotSocial '12.

[35]  H. Eugene Stanley,et al.  Quantifying Wikipedia Usage Patterns Before Stock Market Moves , 2013, Scientific Reports.

[36]  Sofus A. Macskassy,et al.  More than Words: Quantifying Language to Measure Firms' Fundamentals the Authors Are Grateful for Assiduous Research Assistance from Jie Cao and Shuming Liu. We Appreciate Helpful Comments From , 2007 .

[37]  A. Vespignani Predicting the Behavior of Techno-Social Systems , 2009, Science.

[38]  Raphael N. Markellos,et al.  Information Demand and Stock Market Volatility , 2012 .

[39]  H. Eugene Stanley,et al.  Quantifying the Advantage of Looking Forward , 2012, Scientific Reports.

[40]  Paul C. Tetlock Giving Content to Investor Sentiment: The Role of Media in the Stock Market , 2005, The Journal of Finance.

[41]  H Eugene Stanley,et al.  Quantifying the semantics of search behavior before stock market moves , 2014, Proceedings of the National Academy of Sciences.

[42]  Aristides Gionis,et al.  Correlating financial time series with micro-blogging activity , 2012, WSDM '12.

[43]  E. Fama EFFICIENT CAPITAL MARKETS: A REVIEW OF THEORY AND EMPIRICAL WORK* , 1970 .

[44]  Tobias Preis,et al.  Quantifying the Relationship Between Financial News and the Stock Market , 2013, Scientific Reports.

[45]  Benyuan Liu,et al.  Predicting Flu Trends using Twitter data , 2011, 2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[46]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..