Evaluating Search Result Diversity using Intent Hierarchies

Search result diversification aims at returning diversified document lists to cover different user intents for ambiguous or broad queries. Existing diversity measures assume that user intents are independent or exclusive, and do not consider the relationships among the intents. In this paper, we introduce intent hierarchies to model the relationships among intents. Based on intent hierarchies, we propose several hierarchical measures that can consider the relationships among intents. We demonstrate the feasibility of hierarchical measures by using a new test collection based on TREC Web Track 2009-2013 diversity test collections. Our main experimental findings are: (1) Hierarchical measures are generally more discriminative and intuitive than existing measures using flat lists of intents; (2) When the queries have multilayer intent hierarchies, hierarchical measures are less correlated to existing measures, but can get more improvement in discriminative power; (3) Hierarchical measures are more intuitive in terms of diversity or relevance. The hierarchical measures using the whole intent hierarchies are more intuitive than only using the leaf nodes in terms of diversity and relevance.

[1]  W. Bruce Croft,et al.  Term level search result diversification , 2013, SIGIR.

[2]  David R. Karger,et al.  Less is More Probabilistic Models for Retrieving Fewer Relevant Documents , 2006 .

[3]  Jaana Kekäläinen,et al.  IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR '00.

[4]  Monika Henzinger,et al.  Analysis of a very large web search engine query log , 1999, SIGF.

[5]  Craig MacDonald,et al.  Exploiting query reformulations for web search result diversification , 2010, WWW '10.

[6]  Charles L. A. Clarke,et al.  Overview of the TREC 2010 Web Track , 2010, TREC.

[7]  Charles L. A. Clarke,et al.  A comparative analysis of cascade measures for novelty and diversity , 2011, WSDM '11.

[8]  Tetsuya Sakai Evaluation with informational and navigational intents , 2012, WWW.

[9]  Olivier Chapelle,et al.  Expected reciprocal rank for graded relevance , 2009, CIKM.

[10]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[11]  Stephen E. Robertson,et al.  Simple Evaluation Metrics for Diversified Search Results , 2010, EVIA@NTCIR.

[12]  Jade Goldstein-Stewart,et al.  The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries , 1998, SIGIR Forum.

[13]  W. Bruce Croft,et al.  Diversity by proportionality: an election-based approach to search result diversification , 2012, SIGIR '12.

[14]  Stephen E. Robertson,et al.  A new rank correlation coefficient for information retrieval , 2008, SIGIR '08.

[15]  Filip Radlinski,et al.  Improving personalized web search using result diversification , 2006, SIGIR.

[16]  Tetsuya Sakai,et al.  Evaluating evaluation metrics based on the bootstrap , 2006, SIGIR.

[17]  Xiaojin Zhu,et al.  Improving Diversity in Ranking using Absorbing Random Walks , 2007, NAACL.

[18]  Tetsuya Sakai,et al.  Evaluating diversified search results using per-intent graded relevance , 2011, SIGIR.

[19]  J. Golbeck In real life , 2016, Science.

[20]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[21]  Amanda Spink,et al.  Real life, real users, and real needs: a study and analysis of user queries on the web , 2000, Inf. Process. Manag..

[22]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[23]  Ben Carterette,et al.  Multiple testing in statistical analysis of systems-based information retrieval experiments , 2012, TOIS.

[24]  Craig MacDonald,et al.  Intent-aware search result diversification , 2011, SIGIR.

[25]  Ji-Rong Wen,et al.  WWW 2007 / Track: Search Session: Personalization A Largescale Evaluation and Analysis of Personalized Search Strategies ABSTRACT , 2022 .

[26]  Tetsuya Sakai Bootstrap-Based Comparisons of IR Metrics for Finding One Relevant Document , 2006, AIRS.

[27]  Craig MacDonald,et al.  Selectively diversifying web search results , 2010, CIKM.

[28]  Charles L. A. Clarke,et al.  An Effectiveness Measure for Ambiguous and Underspecified Queries , 2009, ICTIR.

[29]  Stephen E. Robertson,et al.  Modelling A User Population for Designing Information Retrieval Metrics , 2008, EVIA@NTCIR.

[30]  Ji-Rong Wen,et al.  Multi-dimensional search result diversification , 2011, WSDM '11.

[31]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .