Predicting Socio-Economic Indicators using News Events

Many socio-economic indicators are sensitive to real-world events. Proper characterization of the events can help to identify the relevant events that drive fluctuations in these indicators. In this paper, we propose a novel generative model of real-world events and employ it to extract events from a large corpus of news articles. We introduce the notion of an event class, which is an abstract grouping of similarly themed events. These event classes are manifested in news articles in the form of event triggers which are specific words that describe the actions or incidents reported in any article. We use the extracted events to predict fluctuations in different socio-economic indicators. Specifically, we focus on food prices and predict the price of 12 different crops based on real-world events that potentially influence food price volatility, such as transport strikes, festivals etc. Our experiments demonstrate that incorporating event information in the prediction tasks reduces the root mean square error (RMSE) of prediction by 22% compared to the standard ARIMA model. We also predict sudden increases in the food prices (i.e. spikes) using events as features, and achieve an average 5-10% increase in accuracy compared to baseline models, including an LDA topic-model based predictive model.

[1]  Mark A. Przybocki,et al.  The Automatic Content Extraction (ACE) Program – Tasks, Data, and Evaluation , 2004, LREC.

[2]  M. Ogneva,et al.  Predicting Restatements in Macroeconomic Indicators Using Accounting Information , 2015 .

[3]  Catherine A. Sugar,et al.  Finding the Number of Clusters in a Dataset , 2003 .

[4]  Qiang Fu,et al.  Correlating events with time series for incident diagnosis , 2014, KDD.

[5]  Anders Hald,et al.  On the history of maximum likelihood in relation to inverse probability and least squares , 1999 .

[6]  Bin Wang,et al.  A probabilistic model for retrospective news event detection , 2005, SIGIR '05.

[7]  Zhenming Liu,et al.  Stock Market Prediction from WSJ: Text Mining via Sparse Matrix Factorization , 2014, 2014 IEEE International Conference on Data Mining.

[8]  Xiaotie Deng,et al.  Exploiting Topic based Twitter Sentiment for Stock Prediction , 2013, ACL.

[9]  Eric P. Xing,et al.  MedLDA: maximum margin supervised topic models for regression and classification , 2009, ICML '09.

[10]  Hsinchun Chen,et al.  Textual analysis of stock market prediction using breaking financial news: The AZFin text system , 2009, TOIS.

[11]  Steven Skiena,et al.  Trading Strategies to Exploit Blog and News Sentiment , 2010, ICWSM.

[12]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[13]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[14]  Ralph Grishman,et al.  Using Document Level Cross-Event Inference to Improve Event Extraction , 2010, ACL.

[15]  S. Fan,et al.  Anatomy of a crisis: The causes and consequences of surging food prices , 2008 .

[16]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[17]  Cynthia Rudin,et al.  Learning theory analysis for association rules and sequential event prediction , 2013, J. Mach. Learn. Res..

[18]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[19]  Richard A. Davis,et al.  Introduction to time series and forecasting , 1998 .

[20]  Dirk Neumann,et al.  Automated news reading: Stock price prediction based on financial news using context-capturing features , 2013, Decis. Support Syst..

[21]  Gyözö Gidófalvi Using News Articles to Predict Stock Price Movements , 2001 .

[22]  Dafna Shahaf,et al.  Connecting the dots between news articles , 2010, IJCAI.

[23]  Yan Liu,et al.  FBLG: a simple and effective approach for temporal dependence discovery from time series data , 2014, KDD.

[24]  Paul C. Tetlock Giving Content to Investor Sentiment: The Role of Media in the Stock Market , 2005, The Journal of Finance.

[25]  Marco Saerens,et al.  A time-based collective factorization for topic discovery and monitoring in news , 2014, WWW.

[26]  Eugene Agichtein,et al.  TM-LDA: efficient online modeling of latent topic transitions in social media , 2012, KDD.

[27]  Jian Zhang,et al.  Daily stock market forecast from textual web data , 1998, SMC'98 Conference Proceedings. 1998 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.98CH36218).

[28]  Eric Horvitz,et al.  Mining the web to predict future events , 2013, WSDM.

[29]  Sanjiv Kumar,et al.  Google Correlate Whitepaper , 2011 .

[30]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[31]  Uzay Kaymak,et al.  A news event-driven approach for the historical value at risk method , 2015, Expert Syst. Appl..

[32]  Roi Blanco,et al.  Hybrid models for future event prediction , 2011, CIKM '11.

[33]  Luo Si,et al.  Mining contrastive opinions on political texts using cross-perspective topic model , 2012, WSDM '12.

[34]  Deyu Zhou,et al.  Event trigger identification for biomedical events extraction using domain knowledge , 2014, Bioinform..

[35]  Joseph Engelberg,et al.  The Causal Impact of Media in Financial Markets , 2009 .

[36]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[37]  Noriaki Kawamae,et al.  Trend analysis model: trend consists of temporal words, topics, and timestamps , 2011, WSDM '11.

[38]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[39]  Brendan T. O'Connor,et al.  Computational Text Analysis for Social Science: Model Assumptions and Complexity , 2011 .