论文信息 - How fresh do you want your search results? - 字舞流文

How fresh do you want your search results?

Researchers have recognized the importance of utilizing temporal features for improving the performance of information retrieval systems. Specifically, the timeliness of a web document can be a significant factor for determining whether it is relevant for a search query. Previous works have proposed time-aware retrieval models with particular focus on news queries, where recent web documents related with a real-world event are generally preferable. These queries typically exhibit bursts in the volume of published documents or submitted queries. However, no work has studied the role of time in queries such as "credit card overdraft fees" that have no major spikes in either document or query volumes over time, yet they still favor more recently published documents. In this work, we focus on this class of queries that we refer to as "timely queries". We show that the change in the terms distribution of results of timely queries over time is strongly correlated with the users' perception of time sensitivity. Based on this observation, we propose a method to estimate the query timeliness requirements and we propose principled ways to incorporate document freshness into the ranking model. Our study shows that our method yields a more accurate estimation of timeliness compared to volume-based approaches. We experimentally compare our ranking strategy with other time-sensitive and non time-sensitive ranking algorithms and we show that it improves the results' retrieval quality for timely queries.

Vagelis Hristidis | Anastasios Arvanitis | Shiwen Cheng | Vagelis Hristidis | Shiwen Cheng | Anastasios Arvanitis

[1] Ronald Fagin,et al. Comparing top k lists , 2003, SODA '03.

[2] Ophir Frieder,et al. Varying approaches to topical web query classification , 2007, SIGIR.

[3] Jaana Kekäläinen,et al. Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[4] Susan T. Dumais,et al. Understanding temporal query dynamics , 2011, WSDM '11.

[5] Fernando Diaz,et al. Temporal profiles of queries , 2007, TOIS.

[6] Susan T. Dumais,et al. Modeling and predicting behavioral dynamics on the web , 2012, WWW.

[7] Fernando Diaz,et al. Time is of the essence: improving recency ranking using Twitter data , 2010, WWW '10.

[8] Qiang Wu,et al. Click-through prediction for news queries , 2009, SIGIR.

[9] Susan T. Dumais,et al. Similarity Measures for Short Segments of Text , 2007, ECIR.

[10] Oren Etzioni,et al. Web document clustering: a feasibility demonstration , 1998, SIGIR '98.

[11] Ricardo Baeza-Yates,et al. Clustering and exploring search results using timeline constructions , 2009, CIKM.

[12] Brian D. Davison,et al. Learning to rank for freshness and relevance , 2011, SIGIR.

[13] Susan T. Dumais,et al. Leveraging temporal dynamics of document content in relevance ranking , 2010, WSDM '10.

[14] Luis Gravano,et al. Answering General Time-Sensitive Queries , 2008, IEEE Transactions on Knowledge and Data Engineering.

[15] Gilad Mishne,et al. Towards recency ranking in web search , 2010, WSDM '10.

[16] José Luis Vicedo González,et al. TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[17] Pavel Serdyukov,et al. Recency ranking by diversification of result set , 2011, CIKM '11.

[18] Ricardo Campos. Using k-Top retrieved web snippets to date temporalimplicit queries based on web content analysis , 2011, SIGIR '11.

[19] Víctor Fresno-Fernández,et al. Integrating the Probabilistic Models BM25/BM25F into Lucene , 2009, ArXiv.

[20] Miles Efron,et al. Estimation methods for ranking recent information , 2011, SIGIR.

[21] M. de Rijke,et al. Adaptive Temporal Query Modeling , 2012, ECIR.

[22] Anna-Lan Huang,et al. Similarity Measures for Text Document Clustering , 2008 .

[23] Ellen M. Voorhees,et al. TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing) , 2005 .

[24] Ian Soboroff,et al. A comparison of pooled and sampled relevance judgments , 2007, EVIA@NTCIR.

[25] Fernando Diaz,et al. Integration of news content into web results , 2009, WSDM '09.

[26] Huaiyu Zhu. On Information and Sufficiency , 1997 .

[27] W. Bruce Croft,et al. Time-based language models , 2003, CIKM '03.

[28] Peter Fankhauser,et al. Boilerplate detection using shallow text features , 2010, WSDM '10.

[29] Wei-Ying Ma,et al. Learning to cluster web search results , 2004, SIGIR '04.

[30] Gregory N. Hullender,et al. Learning to rank using gradient descent , 2005, ICML.

[31] Fernando Diaz,et al. Improving recency ranking using twitter data , 2013, TIST.