Hybrid models for future event prediction

We present a hybrid method to turn off-the-shelf information retrieval (IR) systems into future event predictors. Given a query, a time series model is trained on the publication dates of the retrieved documents to capture trends and periodicity of the associated events. The periodicity of historic data is used to estimate a probabilistic model to predict future bursts. Finally, a hybrid model is obtained by intertwining the probabilistic and the time-series model. Our empirical results on the New York Times corpus show that autocorrelation functions of time-series suffice to classify queries accurately and that our hybrid models lead to more accurate future event predictions than baseline competitors.

[1]  Ryoji Kataoka,et al.  Detecting periodic changes in search intentions in a search engine , 2010, CIKM '10.

[2]  David M. Pennock,et al.  Prediction without markets , 2010, EC '10.

[3]  Brian N. Bershad,et al.  Why we search: visualizing and predicting user behavior , 2007, WWW '07.

[4]  Michael Gamon,et al.  BLEWS: Using Blogs to Provide Context for News Articles , 2008, ICWSM.

[5]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1972 .

[6]  H. Varian,et al.  Predicting the Present with Google Trends , 2009 .

[7]  Susan T. Dumais,et al.  Understanding temporal query dynamics , 2011, WSDM '11.

[8]  Yorick Wilks,et al.  Evaluating Automatically Generated Timelines from the Web , 2006, LREC.

[9]  Shaul Markovitch,et al.  Similarity of Temporal Query Logs Based on ARIMA Model , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[10]  Ying Zhang,et al.  Time series analysis of a Web search engine transaction log , 2009, Inf. Process. Manag..

[11]  David M. Pennock,et al.  Predicting consumer behavior with Web search , 2010, Proceedings of the National Academy of Sciences.

[12]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[13]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[14]  Ricardo Baeza-Yates,et al.  Clustering and exploring search results using timeline constructions , 2009, CIKM.

[15]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1971 .

[16]  Steve Chien,et al.  Semantic similarity between search engine queries using temporal correlation , 2005, WWW '05.