Mining the web to predict future events

We describe and evaluate methods for learning to forecast forthcoming events of interest from a corpus containing 22 years of news stories. We consider the examples of identifying significant increases in the likelihood of disease outbreaks, deaths, and riots in advance of the occurrence of these events in the world. We provide details of methods and studies, including the automated extraction and generalization of sequences of events from news corpora and multiple web resources. We evaluate the predictive power of the approach on real-world events withheld from the system.

[1]  James Allan,et al.  Topic detection and tracking: event-based information organization , 2002 .

[2]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[3]  Alexander J. Smola,et al.  Unified analysis of streaming news , 2011, WWW.

[4]  Kira Radinsky,et al.  Learning causality for news events prediction , 2012, WWW.

[5]  Noah A. Smith,et al.  Movie Reviews and Revenues: An Experiment in Text Regression , 2010, NAACL.

[6]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[7]  Ido Dagan,et al.  A Probabilistic Classification Approach for Lexical Textual Entailment , 2005, AAAI.

[8]  Mark Liberman,et al.  Large, Multilingual, Broadcast News Corpora for Cooperative Research in Topic Detection and Tracking: The TDT-2 and TDT-3 Corpus Efforts , 2000, LREC.

[9]  Yiming Yang,et al.  CMU Report on TDT-2: Segmentation, Detection and Tracking , 1999 .

[10]  Kalev Leetaru,et al.  Culturomics 2.0: Forecasting large-scale human behavior using global news media tone in time and space , 2011, First Monday.

[11]  Adam Jatowt,et al.  Studying how the past is remembered: towards computational history through large scale text mining , 2011, CIKM '11.

[12]  Devika Subramanian,et al.  Hubs, Authorities, and Networks: Predicting Conflict Using Events Data , 2006 .

[13]  Brian N. Bershad,et al.  Why we search: visualizing and predicting user behavior , 2007, WWW '07.

[14]  Björn-Olav Dozo,et al.  Quantitative Analysis of Culture Using Millions of Digitized Books , 2010 .

[15]  Roi Blanco,et al.  Hybrid models for future event prediction , 2011, CIKM '11.

[16]  Bernardo A. Huberman,et al.  Predicting the Future with Social Media , 2010, Web Intelligence.

[17]  S. Griffis EDITOR , 1997, Journal of Navigation.

[18]  A. Baqui,et al.  Cholera epidemics in Bangladesh: 1985-1991. , 1992, Journal of diarrhoeal diseases research.

[19]  Diana Richards,et al.  Political Complexity: Nonlinear Models of Politics , 2000 .

[20]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[21]  Gilad Mishne,et al.  Predicting Movie Sales from Blogger Sentiment , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[22]  Shaul Markovitch,et al.  Similarity of Temporal Query Logs Based on ARIMA Model , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[23]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.