An axiomatic approach for result diversification

Understanding user intent is key to designing an effective ranking system in a search engine. In the absence of any explicit knowledge of user intent, search engines want to diversify results to improve user satisfaction. In such a setting, the probability ranking principle-based approach of presenting the most relevant results on top can be sub-optimal, and hence the search engine would like to trade-off relevance for diversity in the results. In analogy to prior work on ranking and clustering systems, we use the axiomatic approach to characterize and design diversification systems. We develop a set of natural axioms that a diversification system is expected to satisfy, and show that no diversification function can satisfy all the axioms simultaneously. We illustrate the use of the axiomatic framework by providing three example diversification objectives that satisfy different subsets of the axioms. We also uncover a rich link to the facility dispersion problem that results in algorithms for a number of diversification objectives. Finally, we propose an evaluation methodology to characterize the objectives and the underlying axioms. We conduct a large scale evaluation of our objectives based on two data sets: a data set derived from the Wikipedia disambiguation pages and a product database.

[1]  Sean M. McNee,et al.  Improving recommendation lists through topic diversification , 2005, WWW '05.

[2]  S. S. Ravi,et al.  Approximation Algorithms for Facility Dispersion , 2018, Handbook of Approximation Algorithms and Metaheuristics.

[3]  ChengXiang Zhai,et al.  Risk minimization and language modeling in text retrieval dissertation abstract , 2002, SIGF.

[4]  S. S. Ravi,et al.  Facility Dispersion Problems: Heuristics and Special Cases (Extended Abstract) , 1991, WADS.

[5]  Moshe Tennenholtz,et al.  On the Axiomatic Foundations of Ranking Systems , 2005, IJCAI.

[6]  Sihem Amer-Yahia,et al.  Efficient Computation of Diverse Query Results , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[7]  Alan M. Frieze,et al.  Min-Wise Independent Permutations , 2000, J. Comput. Syst. Sci..

[8]  Sreenivas Gollapudi,et al.  Bypass rates: reducing query abandonment using negative inferences , 2008, KDD.

[9]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[10]  Yair Bartal,et al.  On approximating arbitrary metrices by tree metrics , 1998, STOC '98.

[11]  Stephen E. Robertson,et al.  On rank-based effectiveness measures and optimization , 2007, Information Retrieval.

[12]  B. Korte,et al.  An Analysis of the Greedy Heuristic for Independence Systems , 1978 .

[13]  S. S. Ravi,et al.  Heuristic and Special Case Algorithms for Dispersion Problems , 1994, Oper. Res..

[14]  John D. Lafferty,et al.  A risk minimization framework for information retrieval , 2006, Inf. Process. Manag..

[15]  Refael Hassin,et al.  Approximation algorithms for maximum dispersion , 1997, Oper. Res. Lett..

[16]  David R. Karger,et al.  Less is More Probabilistic Models for Retrieving Fewer Relevant Documents , 2006 .

[17]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[18]  Sreenivas Gollapudi,et al.  Exploiting asymmetry in hierarchical topic extraction , 2006, CIKM '06.

[19]  Jon M. Kleinberg,et al.  An Impossibility Theorem for Clustering , 2002, NIPS.

[20]  K. Arrow,et al.  Social Choice and Individual Values , 1951 .

[21]  Barun Chandra,et al.  Approximation Algorithms for Dispersion Problems , 2001, J. Algorithms.

[22]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.