Answering General Time-Sensitive Queries

Time is an important dimension of relevance for a large number of searches, such as over blogs and news archives. So far, research on searching over such collections has largely focused on locating topically similar documents for a query. Unfortunately, topic similarity alone is not always sufficient for document ranking. In this paper, we observe that, for an important class of queries that we call time-sensitive queries, the publication time of the documents in a news archive is important and should be considered in conjunction with the topic similarity to derive the final document ranking. Earlier work has focused on improving retrieval for “recency” queries that target recent documents. We propose a more general framework for handling time-sensitive queries and we automatically identify the important time intervals that are likely to be of interest for a query. Then, we build scoring techniques that seamlessly integrate the temporal aspect into the overall ranking mechanism. We present an extensive experimental evaluation using a variety of news article data sets, including TREC data as well as real web data analyzed using the Amazon Mechanical Turk. We examine several techniques for detecting the important time intervals for a query over a news archive and for incorporating this information in the retrieval process. We show that our techniques are robust and significantly improve result quality for time-sensitive queries compared to state-of-the-art retrieval techniques.

[1]  David Evans,et al.  Tracking and summarizing news on a daily basis with Columbia's Newsblaster , 2002 .

[2]  Robert Krovetz,et al.  Viewing morphology as an inference process , 1993, Artif. Intell..

[3]  Robert Miller,et al.  Just-in-time language modelling , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[4]  W. Bruce Croft,et al.  A general language model for information retrieval , 1999, CIKM '99.

[5]  Gilad Mishne Using Blog Properties to Improve Retrieval , 2007, ICWSM.

[6]  W. Bruce Croft,et al.  Time-based language models , 2003, CIKM '03.

[7]  Stephen E. Robertson,et al.  Overview of the Okapi projects , 1997, J. Documentation.

[8]  Djoerd Hiemstra,et al.  The Importance of Prior Probabilities for Entry Page Search , 2002, SIGIR '02.

[9]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[10]  Yiming Yang,et al.  Utility-based information distillation over temporally sequenced documents , 2007, SIGIR.

[11]  Patrick Blackburn,et al.  The Language of Time: A Reader , 2006, Computational Linguistics.

[12]  Hai Leong Chieu,et al.  Query based event extraction along a timeline , 2004, SIGIR '04.

[13]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 2 , 2000, Inf. Process. Manag..

[14]  S. Robertson The probability ranking principle in IR , 1997 .

[15]  Fernando Diaz,et al.  Using temporal profiles of queries for precision prediction , 2004, SIGIR '04.

[16]  D. A. Bell,et al.  Applied Statistics , 1953, Nature.

[17]  Stephen E. Robertson,et al.  Relevance weighting for query independent evidence , 2005, SIGIR '05.

[18]  James Allan,et al.  Automatic generation of overview timelines , 2000, SIGIR '00.

[19]  Rynson W. H. Lau,et al.  Knowledge and Data Engineering for e-Learning Special Issue of IEEE Transactions on Knowledge and Data Engineering , 2008 .

[20]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 1 , 2000, Inf. Process. Manag..

[21]  V. Rich Personal communication , 1989, Nature.

[22]  Stephen E. Robertson,et al.  Okapi at TREC-7: Automatic Ad Hoc, Filtering, VLC and Interactive , 1998, TREC.

[23]  Kuan-Yu Chen,et al.  Hot Topic Extraction Based on Timeline Analysis and Multidimensional Sentence Modeling , 2007, IEEE Transactions on Knowledge and Data Engineering.

[24]  Fernando Diaz,et al.  Temporal profiles of queries , 2007, TOIS.

[25]  Stephen E. Robertson,et al.  Microsoft Cambridge at TREC 14: Enterprise Track , 2005, TREC.

[26]  W. Bruce Croft,et al.  Combining the language model and inference network approaches to retrieval , 2004, Inf. Process. Manag..

[27]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[28]  Luis Gravano,et al.  Answering General Time-Sensitive Queries , 2012, IEEE Trans. Knowl. Data Eng..

[29]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.