Selectively diversifying web search results

Search result diversification is a natural approach for tackling ambiguous queries. Nevertheless, not all queries are equally ambiguous, and hence different queries could benefit from different diversification strategies. A more lenient or more aggressive diversification strategy is typically encoded by existing approaches as a trade-off between promoting relevance or diversity in the search results. In this paper, we propose to learn such a trade-off on a per-query basis. In particular, we examine how the need for diversification can be learnt for each query - given a diversification approach and an unseen query, we predict an effective trade-off between relevance and diversity based on similar previously seen queries. Thorough experiments using the TREC ClueWeb09 collection show that our selective approach can significantly outperform a uniform diversification for both classical and state-of-the-art diversification approaches.

[1]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[2]  Craig MacDonald,et al.  Learning to Select a Ranking Function , 2010, ECIR.

[3]  David R. Karger,et al.  Less is More Probabilistic Models for Retrieving Fewer Relevant Documents , 2006 .

[4]  Jun Wang,et al.  Portfolio theory of information retrieval , 2009, SIGIR.

[5]  Giorgio Gambosi,et al.  FUB, IASI-CNR and University of Tor Vergata at TREC 2008 Blog Track , 2008, TREC.

[6]  Craig MacDonald,et al.  Explicit Search Result Diversification through Sub-queries , 2010, ECIR.

[7]  Elad Yom-Tov,et al.  Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval , 2005, SIGIR '05.

[8]  Sreenivas Gollapudi,et al.  An axiomatic approach for result diversification , 2009, WWW '09.

[9]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[10]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[11]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[12]  Craig MacDonald,et al.  Voting for related entities , 2010, RIAO.

[13]  Fabrizio Silvestri,et al.  Mining Query Logs: Turning Search Usage Data into Knowledge , 2010, Found. Trends Inf. Retr..

[14]  Yong Yu,et al.  Identification of ambiguous queries in web search , 2009, Inf. Process. Manag..

[15]  Elad Yom-Tov,et al.  Estimating the query difficulty for information retrieval , 2010, Synthesis Lectures on Information Concepts, Retrieval, and Services.

[16]  Mark Sanderson,et al.  Ambiguous queries: test collections need more sense , 2008, SIGIR '08.

[17]  Mark Sanderson,et al.  Multiple approaches to analysing query diversity , 2009, SIGIR.

[18]  Iadh Ounis,et al.  Inferring Query Performance Using Pre-retrieval Predictors , 2004, SPIRE.

[19]  Ben Carterette,et al.  Probabilistic models of ranking novel documents for faceted topic retrieval , 2009, CIKM.

[20]  Ian Witten,et al.  Data Mining , 2000 .

[21]  Tao Qin,et al.  Microsoft Research Asia at Web Track and Terabyte Track of TREC 2004 , 2004, TREC.

[22]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[23]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[24]  Mark Sanderson,et al.  Ambiguous requests: implications for retrieval tests , 2007 .

[25]  Stephen M. Omohundro,et al.  Five Balltree Construction Algorithms , 2009 .

[26]  Craig MacDonald,et al.  Exploiting query reformulations for web search result diversification , 2010, WWW '10.

[27]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[28]  David Hawking,et al.  Overview of the TREC 2004 Web Track , 2004, TREC.

[29]  Ben He,et al.  Terrier : A High Performance and Scalable Information Retrieval Platform , 2022 .

[30]  W. Bruce Croft,et al.  Query performance prediction in web search environments , 2007, SIGIR.

[31]  Iadh Ounis,et al.  Query performance prediction , 2006, Inf. Syst..

[32]  Harry Shum,et al.  Query Dependent Ranking Using K-nearest Neighbor * , 2022 .

[33]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[34]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[35]  In-Ho Kang,et al.  Query type classification for web document retrieval , 2003, SIGIR.

[36]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[37]  Eugene Agichtein,et al.  Query Ambiguity Revisited: Clickthrough Measures for Distinguishing Informational and Ambiguous Queries , 2010, NAACL.

[38]  W. Bruce Croft,et al.  Predicting query performance , 2002, SIGIR '02.

[39]  Stephen E. Robertson,et al.  Ambiguous requests: implications for retrieval tests, systems and theories , 2007, SIGF.