Intent-aware search result diversification

Search result diversification has gained momentum as a way to tackle ambiguous queries. An effective approach to this problem is to explicitly model the possible aspects underlying a query, in order to maximise the estimated relevance of the retrieved documents with respect to the different aspects. However, such aspects themselves may represent information needs with rather distinct intents (e.g., informational or navigational). Hence, a diverse ranking could benefit from applying intent-aware retrieval models when estimating the relevance of documents to different aspects. In this paper, we propose to diversify the results retrieved for a given query, by learning the appropriateness of different retrieval models for each of the aspects underlying this query. Thorough experiments within the evaluation framework provided by the diversity task of the TREC 2009 and 2010 Web tracks show that the proposed approach can significantly improve state-of-the-art diversification approaches.

[1]  Iadh Ounis,et al.  The Static Absorbing Model for the Web , 2005, J. Web Eng..

[2]  Yong Yu,et al.  Identification of ambiguous queries in web search , 2009, Inf. Process. Manag..

[3]  Elad Yom-Tov,et al.  Estimating the query difficulty for information retrieval , 2010, Synthesis Lectures on Information Concepts, Retrieval, and Services.

[4]  In-Ho Kang,et al.  Query type classification for web document retrieval , 2003, SIGIR.

[5]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[6]  Claudio Carpineto,et al.  Query Difficulty, Robustness, and Selective Application of Query Expansion , 2004, ECIR.

[7]  Daniel E. Rose,et al.  Understanding user goals in web search , 2004, WWW '04.

[8]  Ben Carterette,et al.  Probabilistic models of ranking novel documents for faceted topic retrieval , 2009, CIKM.

[9]  Ben Carterette,et al.  Million Query Track 2007 Overview , 2008, TREC.

[10]  Tao Qin,et al.  LETOR: A benchmark collection for research on learning to rank for information retrieval , 2010, Information Retrieval.

[11]  Iadh Ounis,et al.  Query performance prediction , 2006, Inf. Syst..

[12]  W. Bruce Croft,et al.  Predicting query performance , 2002, SIGIR '02.

[13]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[14]  Ben He,et al.  Terrier : A High Performance and Scalable Information Retrieval Platform , 2022 .

[15]  Stephen E. Robertson,et al.  Ambiguous requests: implications for retrieval tests, systems and theories , 2007, SIGF.

[16]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[17]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[18]  Mark Sanderson,et al.  Ambiguous requests: implications for retrieval tests , 2007 .

[19]  Harry Shum,et al.  Query Dependent Ranking Using K-nearest Neighbor * , 2022 .

[20]  Luca Becchetti,et al.  Link-Based Characterization and Detection of Web Spam , 2006, AIRWeb.

[21]  Donald Metzler,et al.  Automatic feature selection in the markov random field model for information retrieval , 2007, CIKM '07.

[22]  Charles L. A. Clarke,et al.  Overview of the TREC 2011 Web Track | NIST , 2011 .

[23]  Craig MacDonald,et al.  Explicit Search Result Diversification through Sub-queries , 2010, ECIR.

[24]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[25]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[26]  Ian Witten,et al.  Data Mining , 2000 .

[27]  Craig MacDonald,et al.  Exploiting query reformulations for web search result diversification , 2010, WWW '10.

[28]  Stephen E. Robertson,et al.  Relevance weighting for query independent evidence , 2005, SIGIR '05.

[29]  W. Bruce Croft,et al.  Query performance prediction in web search environments , 2007, SIGIR.

[30]  Charles L. A. Clarke,et al.  Overview of the TREC 2010 Web Track , 2010, TREC.

[31]  David R. Karger,et al.  Less is More Probabilistic Models for Retrieving Fewer Relevant Documents , 2006 .

[32]  Mark Sanderson,et al.  Ambiguous queries: test collections need more sense , 2008, SIGIR '08.

[33]  Craig MacDonald,et al.  Selectively diversifying web search results , 2010, CIKM.

[34]  Charles L. A. Clarke,et al.  An Effectiveness Measure for Ambiguous and Underspecified Queries , 2009, ICTIR.

[35]  Mark Sanderson,et al.  Multiple approaches to analysing query diversity , 2009, SIGIR.

[36]  Olivier Chapelle,et al.  Expected reciprocal rank for graded relevance , 2009, CIKM.

[37]  Rodrygo L. T. Santos,et al.  Diversifying for Multiple Information Needs , 2011 .

[38]  Jun Wang,et al.  Portfolio theory of information retrieval , 2009, SIGIR.

[39]  S. Robertson The probability ranking principle in IR , 1997 .

[40]  Charles L. A. Clarke,et al.  Efficient and effective spam filtering and re-ranking for large web datasets , 2010, Information Retrieval.

[41]  Craig MacDonald,et al.  Learning to Select a Ranking Function , 2010, ECIR.

[42]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[43]  David Hawking,et al.  Overview of the TREC 2004 Web Track , 2004, TREC.