Diversity by proportionality: an election-based approach to search result diversification

This paper presents a different perspective on diversity in search results: diversity by proportionality. We consider a result list most diverse, with respect to some set of topics related to the query, when the number of documents it provides on each topic is proportional to the topic's popularity. Consequently, we propose a framework for optimizing proportionality for search result diversification, which is motivated by the problem of assigning seats to members of competing political parties. Our technique iteratively determines, for each position in the result ranked list, the topic that best maintains the overall proportionality. It then selects the best document on this topic for this position. We demonstrate empirically that our method significantly outperforms the top performing approach in the literature not only on our proposed metric for proportionality, but also on several standard diversity measures. This result indicates that promoting proportionality naturally leads to minimal redundancy, which is a goal of the current diversity approaches.

[1]  W. Bruce Croft,et al.  TREC 2010 Web Track Notebook: Term Dependence, Spam Filtering and Quality Bias , 2010, TREC.

[2]  W. Bruce Croft,et al.  Query reformulation using anchor text , 2010, WSDM '10.

[3]  Charles L. A. Clarke,et al.  Overview of the TREC 2011 Web Track | NIST , 2011 .

[4]  Ben Carterette,et al.  Probabilistic models of ranking novel documents for faceted topic retrieval , 2009, CIKM.

[5]  Ricardo A. Baeza-Yates,et al.  Query Recommendation Using Query Logs in Search Engines , 2004, EDBT Workshops.

[6]  Filip Radlinski,et al.  Improving personalized web search using result diversification , 2006, SIGIR.

[7]  Areno Lijphart,et al.  A Study of Twenty-Seven Democracies 1945-1990 , 1995 .

[8]  Charles L. A. Clarke,et al.  A comparative analysis of cascade measures for novelty and diversity , 2011, WSDM '11.

[9]  W. Bruce Croft,et al.  Latent concept expansion using markov random fields , 2007, SIGIR.

[10]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[11]  W. Bruce Croft,et al.  Inferring query aspects from reformulations using clustering , 2011, CIKM '11.

[12]  Olivier Chapelle,et al.  Expected reciprocal rank for graded relevance , 2009, CIKM.

[13]  V CormackGordon,et al.  Efficient and effective spam filtering and re-ranking for large web datasets , 2011 .

[14]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[15]  Kenneth Ward Church,et al.  Query suggestion using hitting time , 2008, CIKM '08.

[16]  Craig MacDonald,et al.  Intent-aware search result diversification , 2011, SIGIR.

[17]  Benjamin Rey,et al.  Generating query substitutions , 2006, WWW '06.

[18]  Krishna Bharat,et al.  Diversifying web search results , 2010, WWW '10.

[19]  W. Bruce Croft,et al.  Search Engines - Information Retrieval in Practice , 2009 .

[20]  Craig MacDonald,et al.  Selectively diversifying web search results , 2010, CIKM.

[21]  Charles L. A. Clarke,et al.  An Effectiveness Measure for Ambiguous and Underspecified Queries , 2009, ICTIR.

[22]  Jun Wang,et al.  Portfolio theory of information retrieval , 2009, SIGIR.

[23]  Charles L. A. Clarke,et al.  Efficient and effective spam filtering and re-ranking for large web datasets , 2010, Information Retrieval.

[24]  W. Bruce Croft,et al.  UMass at TREC 2010 Web Track : Term Dependence , Spam Filtering and Quality Bias , 2010 .

[25]  M. Gallagher Proportionality, disproportionality and electoral systems , 1991 .

[26]  ChengXiang Zhai,et al.  Mining term association patterns from search logs for effective query reformulation , 2008, CIKM '08.

[27]  Craig MacDonald,et al.  Exploiting query reformulations for web search result diversification , 2010, WWW '10.

[28]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[29]  Nick Craswell,et al.  An experimental comparison of click position-bias models , 2008, WSDM '08.