Using temporal bursts for query modeling

We present an approach to query modeling that leverages the temporal distribution of documents in an initially retrieved set of documents. In news-related document collections such distributions tend to exhibit bursts. Here, we define a burst to be a time period where unusually many documents are published. In our approach we detect bursts in result lists returned for a query. We then model the term distributions of the bursts using a reduced result list and select its most descriptive terms. Finally, we merge the sets of terms obtained in this manner so as to arrive at a reformulation of the original query. For query sets that consist of both temporal and non-temporal queries, our query modeling approach incorporates an effective selection method of terms. We consistently and significantly improve over various baselines, such as relevance models, on both news collections and a collection of blog posts.

[1]  Giorgio Gambosi,et al.  On relevance, time and query expansion , 2011, CIKM '11.

[2]  Mostafa Keikha,et al.  TEMPER: A Temporal Relevance Feedback Method , 2011, ECIR.

[3]  Maarten de Rijke,et al.  A Generative Blog Post Retrieval Model that Uses Query Expansion based on External Collections , 2009, ACL/IJCNLP.

[4]  John D. Lafferty,et al.  Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[5]  M. de Rijke,et al.  Credibility Improves Topical Blog Post Retrieval , 2008, ACL.

[6]  Maarten de Rijke,et al.  Supervised query modeling using wikipedia , 2010, SIGIR '10.

[7]  Fernando Diaz,et al.  Time is of the essence: improving recency ranking using Twitter data , 2010, WWW '10.

[8]  Craig MacDonald,et al.  Overview of the TREC 2006 Blog Track , 2006, TREC.

[9]  M. de Rijke,et al.  Credibility-inspired ranking for blog post retrieval , 2012, Information Retrieval.

[10]  Fernando Diaz,et al.  Improving the estimation of relevance models using large external corpora , 2006, SIGIR.

[11]  Zhendong Niu,et al.  Concept Based Query Expansion , 2013, 2013 Ninth International Conference on Semantics, Knowledge and Grids.

[12]  Steve Chien,et al.  Semantic similarity between search engine queries using temporal correlation , 2005, WWW '05.

[13]  Clement Yu,et al.  UIC at TREC 2008 Blog Track , 2008 .

[14]  José Luis Borbinha,et al.  Extracting and Exploring the Geo-Temporal Semantics of Textual Resources , 2008, 2008 IEEE International Conference on Semantic Computing.

[15]  M. de Rijke,et al.  Incorporating Query Expansion and Quality Indicators in Searching Microblog Posts , 2011, ECIR.

[16]  Jaap Kamps,et al.  Improving Retrieval Effectiveness by Reranking Documents Based on Controlled Vocabulary , 2004, ECIR.

[17]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[18]  Susan T. Dumais,et al.  Understanding temporal query dynamics , 2011, WSDM '11.

[19]  M. de Rijke,et al.  Conceptual language models for domain-specific retrieval , 2010, Inf. Process. Manag..

[20]  Fernando Diaz,et al.  Temporal profiles of queries , 2007, TOIS.

[21]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[22]  Jon M. Kleinberg,et al.  Bursty and Hierarchical Structure in Streams , 2002, Data Mining and Knowledge Discovery.

[23]  M. de Rijke,et al.  Exploiting External Collections for Query Expansion , 2012, TWEB.

[24]  Wei Zhang,et al.  UIC at TREC 2006 Blog Track , 2006, TREC.

[25]  Miles Efron,et al.  Linear time series models for term weighting in information retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[26]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[27]  W. Bruce Croft,et al.  Time-based language models , 2003, CIKM '03.

[28]  Miles Efron,et al.  Estimation methods for ranking recent information , 2011, SIGIR.

[29]  M. de Rijke,et al.  Adaptive Temporal Query Modeling , 2012, ECIR.

[30]  Francesco Romani,et al.  Ranking a stream of news , 2005, WWW '05.

[31]  Iadh Ounis,et al.  The TREC Blogs06 Collection: Creating and Analysing a Blog Test Collection , 2006 .

[32]  Richard Sproat,et al.  Mining correlated bursty topic patterns from coordinated text streams , 2007, KDD '07.

[33]  Kazuhiro Seki,et al.  TREC 2007 Blog Track Experiments at Kobe University , 2007, TREC.

[34]  James Pustejovsky,et al.  TimeML: Robust Specification of Event and Temporal Expressions in Text , 2003, New Directions in Question Answering.

[35]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[36]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[37]  Michael Gertz,et al.  Temporal Information Retrieval: Challenges and Opportunities , 2011, TWAW.

[38]  Maarten de Rijke,et al.  News Comments: Exploring, Modeling, and Online Prediction , 2010, ECIR.

[39]  M. de Rijke,et al.  A few examples go a long way: constructing query models from elaborate query formulations , 2008, SIGIR '08.

[40]  Timothy W. Finin,et al.  The BlogVox Opinion Retrieval System , 2006, TREC.

[41]  James Pustejovsky,et al.  Temporal Processing with the TARSQI Toolkit , 2008, COLING.

[42]  Mostafa Keikha,et al.  Time-based relevance models , 2011, SIGIR.

[43]  M. de Rijke,et al.  Ranking related entities: components and analyses , 2010, CIKM.

[44]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[45]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[46]  Fernando Diaz,et al.  UMass at TREC 2004: Novelty and HARD , 2004, TREC.

[47]  James D. Hamilton Time Series Analysis , 1994 .

[48]  Gabriella Kazai,et al.  Advances in Information Retrieval , 2015, Lecture Notes in Computer Science.

[49]  M. de Rijke,et al.  Semantic Document Selection - Historical Research on Collections That Span Multiple Centuries , 2012, TPDL.

[50]  Luis Gravano,et al.  Answering General Time-Sensitive Queries , 2008, IEEE Transactions on Knowledge and Data Engineering.

[51]  M. de Rijke,et al.  Category-Based Query Modeling for Entity Search , 2010, ECIR.

[52]  M. de Rijke,et al.  Cognitive Temporal Document Priors , 2013, DIR.

[53]  Zeno Vendler,et al.  Verbs and Times , 1957, The Language of Time - A Reader.

[54]  Gerhard Weikum,et al.  A Language Modeling Approach for Temporal Information Needs , 2010, ECIR.