Ensembles of Crowds and Computers: Experiments in Forecasting

This paper explores the power of news sentiment to predict financial returns, in particular the returns of a set of European stocks. Building on past decision support work going back to the Delphi method this paper describes a text analysis expert weighting algorithm that aggregates the responses of both humans and algorithms by dynamically selecting the best response according to previous performance. The proposed system is tested through an experiment in which ensembles of experts, crowds and machines analyzed Thomson Reuters news stories and predicted the returns of the relevant stocks mentioned right after the stories appeared. The expert weighting algorithm was better than or as good as the best algorithm or human in most cases. The capacity of the algorithm to dynamically select best answers from humans and machines results in an evolving collective intelligence: the final decision is an aggregation of the best automated individual answers, some of these come from machines, and some from humans. Additionally, this paper shows that the groups of humans, algorithms, and expert weighting algorithms have associated with them particular news topics that these groups are good at making predictions from.

[1]  Michael S. Bernstein,et al.  The future of crowd work , 2013, CSCW.

[2]  Alexandre d'Aspremont,et al.  Predicting abnormal returns from news using text classification , 2008, 0809.2792.

[3]  Steven Skiena,et al.  Trading Strategies to Exploit Blog and News Sentiment , 2010, ICWSM.

[4]  Mike Y. Chen,et al.  Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web , 2001 .

[5]  Jeffrey V. Nickerson,et al.  An internet-scale idea generation system , 2013, TIIS.

[6]  Anindya Datta,et al.  Simultaneously Discovering and Quantifying Risk Types from Textual Risk Disclosures , 2014, Manag. Sci..

[7]  S. Kothari,et al.  The Effect of Disclosures by Management, Analysts, and Business Press on Cost of Capital, Return Volatility, and Analyst Forecasts: A Study Using Content Analysis , 2009 .

[8]  Germán Creamer,et al.  Model calibration and automated trading agent for Euro futures , 2012 .

[9]  Luigi Zingales,et al.  The Corporate Governance Role of the Media: Evidence from Russia , 2006 .

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[12]  W. S. Chan,et al.  Stock Price Reaction to News and No-News: Drift and Reversal after Headlines , 2001 .

[13]  Danushka Bollegala,et al.  Using Multiple Sources to Construct a Sentiment Sensitive Thesaurus for Cross-Domain Sentiment Classification , 2011, ACL.

[14]  M. Hagenau,et al.  Automated News Reading: Stock Price Prediction Based on Financial News Using Context-Specific Features , 2012, 2012 45th Hawaii International Conference on System Sciences.

[15]  Yoav Freund,et al.  Automated trading with boosting and expert weighting , 2010 .

[16]  Yiftach Nagar,et al.  Making Business Predictions by Combining Human and Machine Intelligence in Prediction Markets , 2011, ICIS.

[17]  Jeffrey V. Nickerson,et al.  The Crowdsourcing Design Space , 2011, HCI.

[18]  Feng Li Do Stock Market Investors Understand the Risk Sentiment of Corporate Annual Reports? , 2006 .

[19]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[20]  H. Gurnee Maze Learning in the Collective Situation , 1937 .

[21]  Steven Skiena,et al.  Large-Scale Sentiment Analysis for News and Blogs (system demonstration) , 2007, ICWSM.

[22]  Jeffrey V. Nickerson,et al.  Collective Creativity: Where we are and where we might go , 2012, ArXiv.

[23]  Kenneth F. Wallis,et al.  Combining forecasts – forty years later , 2011 .

[24]  Jeffrey V. Nickerson,et al.  News and Sentiment Analysis of the European Market with a Hybrid Expert Weighting Algorithm , 2013, 2013 International Conference on Social Computing.

[25]  Johan Bollen,et al.  Twitter Mood as a Stock Market Predictor , 2011, Computer.

[26]  M. Fleming,et al.  What Moves the Bond Market? , 1997 .

[27]  N. Dalkey,et al.  An Experimental Application of the Delphi Method to the Use of Experts , 1963 .

[28]  Joseph Engelberg,et al.  The Causal Impact of Media in Financial Markets , 2009 .

[29]  Marc-André Mittermayer,et al.  Forecasting Intraday stock price trends with text mining techniques , 2004, 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the.

[30]  Panagiotis G. Ipeirotis,et al.  Quizz: targeted crowdsourcing with a billion (potential) users , 2014, WWW.

[31]  Jeffrey V. Nickerson,et al.  Impact of Dynamic Corporate News Networks on Asset Return and Volatility , 2013, 2013 International Conference on Social Computing.

[32]  Khurshid Ahmad,et al.  Sentiment Polarity Identification in Financial News: A Cohesion-based Approach , 2007, ACL.

[33]  Yasuaki Sakamoto,et al.  Testing tournament selection in creative problem solving using crowds , 2011, ICIS.

[34]  Cynthia Whissell,et al.  THE DICTIONARY OF AFFECT IN LANGUAGE , 1989 .

[35]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[36]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[37]  R. Clemen Combining forecasts: A review and annotated bibliography , 1989 .

[38]  J. Kruschke,et al.  ALCOVE: an exemplar-based connectionist model of category learning. , 1992, Psychological review.

[39]  J. Tenenbaum,et al.  Optimal Predictions in Everyday Cognition , 2006, Psychological science.

[40]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[41]  Ronen Feldman,et al.  Management's Tone Change, Post Earnings Announcement Drift and Accruals , 2009 .

[42]  Jonathan L. Rogers,et al.  Disclosure Tone and Shareholder Litigation , 2011 .

[43]  Matthew Gentzkow,et al.  Television and Voter Turnout , 2005 .

[44]  Joel Peress,et al.  Media Coverage and the Cross-Section of Stock Returns , 2008 .

[45]  Rishabh Mehrotra,et al.  Group , 2000 .

[46]  Charu C. Aggarwal,et al.  Mining collective intelligence in diverse groups , 2013, WWW.

[47]  Olivia Sheng,et al.  Investigating Predictive Power of Stock Micro Blog Sentiment in Forecasting Future Stock Price Directional Movement , 2011, ICIS.

[48]  Rebecca J. Passonneau,et al.  Semantic Frames to Predict Stock Price Movement , 2013, ACL.

[49]  Sofus A. Macskassy,et al.  More than Words: Quantifying Language to Measure Firms' Fundamentals the Authors Are Grateful for Assiduous Research Assistance from Jie Cao and Shuming Liu. We Appreciate Helpful Comments From , 2007 .

[50]  Panagiotis G. Ipeirotis,et al.  Content and Context: Identifying the Impact of Qualitative Information on Consumer Choice , 2011, ICIS.

[51]  Harold Pashler,et al.  Optimal Predictions in Everyday Cognition: The Wisdom of Individuals or Crowds? , 2008, Cogn. Sci..

[52]  Jian Zhang,et al.  Daily Prediction of Major Stock Indices from Textual WWW Data , 1998, KDD.

[53]  Stephen P. Ryan,et al.  Machine Learning Methods for Demand Estimation , 2015 .

[54]  Jie Jennifer Zhang,et al.  Social Media and Firm Equity Value , 2013, Inf. Syst. Res..

[55]  Xueming Luo,et al.  How Do Consumer Buzz and Traffic in Social Media Marketing Predict the Value of the Firm? , 2013 .

[56]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[57]  E. Fama The Behavior of Stock-Market Prices , 1965 .

[58]  Owen Rambow,et al.  Sentiment Analysis of Twitter Data , 2011 .

[59]  Bo Zhao,et al.  The wisdom of minority: discovering and targeting the right group of workers for crowdsourcing , 2014, WWW.

[60]  Yoav Freund,et al.  Using Boosting for Financial Analysis and Performance Prediction: Application to S&P 500 Companies, Latin American ADRs and Banks , 2010 .

[61]  Douglas K. Pearce,et al.  Stock Prices and Economic News , 1984 .

[62]  Alan R. Dennis,et al.  Trading on Twitter: The Financial Information Content of Emotion in Social Media , 2014, 2014 47th Hawaii International Conference on System Sciences.

[63]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[64]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[65]  Mark Steyvers,et al.  Topics in semantic representation. , 2007, Psychological review.

[66]  Joshua B. Tenenbaum,et al.  The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth , 2001, Cogn. Sci..

[67]  Thomas L. Griffiths,et al.  Probabilistic Topic Models , 2007 .

[68]  Todd D. Kravet,et al.  Textual risk disclosures and investors’ risk perceptions , 2013 .

[69]  Feng Li The Information Content of Forward-Looking Statements in Corporate Filings—A Naïve Bayesian Machine Learning Approach , 2010 .

[70]  Hsinchun Chen,et al.  Textual analysis of stock market prediction using breaking financial news: The AZFin text system , 2009, TOIS.

[71]  H. Stanley,et al.  Quantifying Trading Behavior in Financial Markets Using Google Trends , 2013, Scientific Reports.

[72]  Werner Antweiler,et al.  Is All that Talk Just Noise? The Information Content of Internet Stock Message Boards , 2001 .

[73]  Dean S. Karlan,et al.  Does the Media Matter? A Field Experiment Measuring the Effect of Newspapers on Voting Behavior and Political Opinions , 2006 .

[74]  Jeremy Piger,et al.  Beyond the Numbers: Measuring the Information Content of Earnings Press Release Language , 2011 .

[75]  J. M. Bates,et al.  The Combination of Forecasts , 1969 .

[76]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[77]  Hsinchun Chen,et al.  The information content of mandatory risk factor disclosures in corporate filings , 2010 .

[78]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[79]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[80]  Noah A. Smith,et al.  Probabilistic Frame-Semantic Parsing , 2010, NAACL.

[81]  David Strömberg,et al.  Radio's Impact on Public Spending , 2004 .

[82]  A. M. Turing,et al.  Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.