On query result diversification

In this paper we describe a general framework for evaluation and optimization of methods for diversifying query results. In these methods, an initial ranking candidate set produced by a query is used to construct a result set, where elements are ranked with respect to relevance and diversity features, i.e., the retrieved elements should be as relevant as possible to the query, and, at the same time, the result set should be as diverse as possible. While addressing relevance is relatively simple and has been heavily studied, diversity is a harder problem to solve. One major contribution of this paper is that, using the above framework, we adapt, implement and evaluate several existing methods for diversifying query results. We also propose two new approaches, namely the Greedy with Marginal Contribution (GMC) and the Greedy Randomized with Neighborhood Expansion (GNE) methods. Another major contribution of this paper is that we present the first thorough experimental evaluation of the various diversification techniques implemented in a common framework. We examine the methods' performance with respect to precision, running time and quality of the result. Our experimental results show that while the proposed methods have higher running times, they achieve precision very close to the optimal, while also providing the best result quality. While GMC is deterministic, the randomized approach (GNE) can achieve better result quality if the user is willing to tradeoff running time.

[1]  Wolfgang Nejdl,et al.  Efficient Semantic-Aware Detection of Near Duplicate Resources , 2010, ESWC.

[2]  Sean M. McNee,et al.  Improving recommendation lists through topic diversification , 2005, WWW '05.

[3]  Filip Radlinski,et al.  Improving personalized web search using result diversification , 2006, SIGIR.

[4]  Hua Li,et al.  Improving web search results using affinity graph , 2005, SIGIR '05.

[5]  Sreenivas Gollapudi,et al.  An axiomatic approach for result diversification , 2009, WWW '09.

[6]  Jayant R. Haritsa,et al.  Providing Diversity in K-Nearest Neighbor Query Results , 2003, PAKDD.

[7]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[8]  Mauricio G. C. Resende,et al.  Greedy Randomized Adaptive Search Procedures , 1995, J. Glob. Optim..

[9]  Luiz Satoru Ochi,et al.  New heuristics for the maximum diversity problem , 2007, J. Heuristics.

[10]  Craig MacDonald,et al.  Explicit Search Result Diversification through Sub-queries , 2010, ECIR.

[11]  Filip Radlinski,et al.  Learning diverse rankings with multi-armed bandits , 2008, ICML '08.

[12]  Sihem Amer-Yahia,et al.  Efficient Computation of Diverse Query Results , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[13]  Refael Hassin,et al.  Approximation algorithms for maximum dispersion , 1997, Oper. Res. Lett..

[14]  Olivier Chapelle,et al.  Expected reciprocal rank for graded relevance , 2009, CIKM.

[15]  Ben Carterette,et al.  An analysis of NP-completeness in novelty and diversity ranking , 2009, Information Retrieval.

[16]  Oleg A. Prokopyev,et al.  The equitable dispersion problem , 2009, Eur. J. Oper. Res..

[17]  F. Glover,et al.  Analyzing and Modeling the Maximum Diversity Problem by Zero‐One Programming* , 1993 .

[18]  Barry Smyth,et al.  On the Importance of Being Diverse: Analysing Similarity and Diversity in Web Search , 2004, Intelligent Information Processing.

[19]  Craig MacDonald,et al.  Exploiting query reformulations for web search result diversification , 2010, WWW '10.

[20]  Michael Kuby Programming models for facility dispersion: the p-dispersion and maxisum dispersion problems , 1988 .

[21]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[22]  M. Kuby Programming Models for Facility Dispersion: The p‐Dispersion and Maxisum Dispersion Problems , 2010 .

[23]  Peter Fankhauser,et al.  DivQ: diversification for keyword search over structured databases , 2010, SIGIR.

[24]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[25]  David R. Karger,et al.  Less is More Probabilistic Models for Retrieving Fewer Relevant Documents , 2006 .

[26]  Yi Chen,et al.  Structured Search Result Differentiation , 2009, Proc. VLDB Endow..

[27]  Cong Yu,et al.  It takes variety to make a world: diversification in recommender systems , 2009, EDBT '09.

[28]  Evaggelia Pitoura,et al.  Diversity over Continuous Data , 2009, IEEE Data Eng. Bull..

[29]  Barry Smyth,et al.  Similarity vs. Diversity , 2001, ICCBR.

[30]  Ximena Olivares,et al.  Visual diversification of image search results , 2009, WWW '09.

[31]  Xiaojin Zhu,et al.  Improving Diversity in Ranking using Absorbing Random Walks , 2007, NAACL.

[32]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[33]  Evimaria Terzi,et al.  Highlighting Diverse Concepts in Documents , 2009, SDM.

[34]  Krishna Bharat,et al.  Diversifying web search results , 2010, WWW '10.