Search Result Diversification

Ranking in information retrieval has been traditionally approachedas a pursuit of relevant information, under the assumption that theusers' information needs are unambiguously conveyed by their submittedqueries. Nevertheless, as an inherently limited representation of amore complex information need, every query can arguably be consideredambiguous to some extent. In order to tackle query ambiguity,search result diversification approaches have recently been proposed toproduce rankings aimed to satisfy the multiple possible informationneeds underlying a query. In this survey, we review the published literatureon search result diversification. In particular, we discuss themotivations for diversifying the search results for an ambiguous queryand provide a formal definition of the search result diversification problem.In addition, we describe the most successful approaches in theliterature for producing and evaluating diversity in multiple search domains.Finally, we also discuss recent advances as well as open researchdirections in the field of search result diversification.

[1]  Guido Zuccon,et al.  Using the Quantum Probability Ranking Principle to Rank Interdependent Documents , 2010, ECIR.

[2]  M. E. Maron,et al.  On Relevance, Probabilistic Indexing and Information Retrieval , 1960, JACM.

[3]  W. Bruce Croft,et al.  Quantifying query ambiguity , 2002 .

[4]  Hong Cheng,et al.  An exploration of pattern-based subtopic modeling for search result diversification , 2011, JCDL '11.

[5]  William Goffman,et al.  On relevance as a measure , 1964, Inf. Storage Retr..

[6]  Zhoujun Li,et al.  A Survival Modeling Approach to Biomedical Search Result Diversification Using Wikipedia , 2010, IEEE Transactions on Knowledge and Data Engineering.

[7]  Charles L. A. Clarke,et al.  The impact of intent selection on diversified search evaluation , 2013, SIGIR.

[8]  Olfa Nasraoui,et al.  Mining search engine query logs for query recommendation , 2006, WWW '06.

[9]  Mounia Lalmas,et al.  Workshop on aggregated search , 2008, SIGF.

[10]  Paul Over,et al.  Comparing interactive information retrieval systems across sites: the TREC-6 interactive track matrix experiment , 1998, SIGIR '98.

[11]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[12]  Yi-Cheng Zhang,et al.  Solving the apparent diversity-accuracy dilemma of recommender systems , 2008, Proceedings of the National Academy of Sciences.

[13]  Thorsten Joachims,et al.  Online learning to diversify from implicit feedback , 2012, KDD.

[14]  Rodrygo L. T. Santos,et al.  Topic diversity in tag recommendation , 2013, RecSys.

[15]  Filip Radlinski,et al.  Learning optimally diverse rankings over large document collections , 2010, ICML.

[16]  Jun Wang,et al.  Portfolio theory of information retrieval , 2009, SIGIR.

[17]  Ahmet Murat Ozdemiray,et al.  Score and Rank Aggregation Methods For Explicit Search Result Diversification , 2013 .

[18]  Susan T. Dumais,et al.  Characterizing the value of personalizing search , 2007, SIGIR.

[19]  E. Rowland Theory of Games and Economic Behavior , 1946, Nature.

[20]  S. Robertson The probability ranking principle in IR , 1997 .

[21]  Ximena Olivares,et al.  Visual diversification of image search results , 2009, WWW '09.

[22]  Hugo Liu,et al.  ConceptNet — A Practical Commonsense Reasoning Tool-Kit , 2004 .

[23]  Xueqi Cheng,et al.  Learning for search result diversification , 2014, SIGIR.

[24]  Filip Radlinski,et al.  Learning diverse rankings with multi-armed bandits , 2008, ICML '08.

[25]  Licia Capra,et al.  Temporal diversity in recommender systems , 2010, SIGIR.

[26]  Charles L. A. Clarke,et al.  Overview of the TREC 2011 Web Track , 2011, TREC.

[27]  Mark Sanderson,et al.  Ambiguous queries: test collections need more sense , 2008, SIGIR '08.

[28]  Jade Goldstein-Stewart,et al.  The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries , 1998, SIGIR Forum.

[29]  Mark Sanderson,et al.  Multiple approaches to analysing query diversity , 2009, SIGIR.

[30]  Cong Yu,et al.  It takes variety to make a world: diversification in recommender systems , 2009, EDBT '09.

[31]  Saul Vargas,et al.  Rank and relevance in novelty and diversity metrics for recommender systems , 2011, RecSys '11.

[32]  W. Bruce Croft,et al.  Diversity by proportionality: an election-based approach to search result diversification , 2012, SIGIR '12.

[33]  Yang Song,et al.  Post-ranking query suggestion by diversifying search results , 2011, SIGIR '11.

[34]  Charles L. A. Clarke,et al.  On the informativeness of cascade and intent-aware effectiveness measures , 2011, WWW.

[35]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[36]  Mark Sanderson,et al.  Using score differences for search result diversification , 2014, SIGIR.

[37]  Harry Shum,et al.  Query Dependent Ranking Using K-nearest Neighbor * , 2022 .

[38]  Gerhard J. Woeginger,et al.  Exact Algorithms for NP-Hard Problems: A Survey , 2001, Combinatorial Optimization.

[39]  José Luis Vicedo González,et al.  TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[40]  W. Bruce Croft,et al.  Diversifying query suggestions based on query documents , 2014, SIGIR.

[41]  Fabrizio Silvestri,et al.  Efficient Diversification of Web Search Results , 2011, Proc. VLDB Endow..

[42]  Kenneth Ward Church,et al.  Query suggestion using hitting time , 2008, CIKM '08.

[43]  Udo Kruschwitz,et al.  Deriving query suggestions for site search , 2013, J. Assoc. Inf. Sci. Technol..

[44]  Stefano Mizzaro,et al.  Relevance: The Whole History , 1997, J. Am. Soc. Inf. Sci..

[45]  Craig MacDonald,et al.  Learning to rank query suggestions for adhoc and diversity search , 2012, Information Retrieval.

[46]  Thorsten Joachims,et al.  Predicting diverse subsets using structural SVMs , 2008, ICML '08.

[47]  Benjamin Rey,et al.  Generating query substitutions , 2006, WWW '06.

[48]  J. Marden Analyzing and Modeling Rank Data , 1996 .

[49]  W. Bruce Croft,et al.  A Language Modeling Approach to Information Retrieval , 1998, SIGIR Forum.

[50]  Tetsuya Sakai Evaluation with informational and navigational intents , 2012, WWW.

[51]  Fabrizio Silvestri,et al.  Generating suggestions for queries in the long tail with an inverted index , 2012, Inf. Process. Manag..

[52]  Sean M. McNee,et al.  Improving recommendation lists through topic diversification , 2005, WWW '05.

[53]  Stephen E. Robertson,et al.  Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[54]  Min Wang,et al.  Search result diversification for enterprise data , 2011, CIKM '11.

[55]  ChengXiang Zhai,et al.  Statistical Language Models for Information Retrieval: A Critical Review , 2008, Found. Trends Inf. Retr..

[56]  Yong Yu,et al.  Identification of ambiguous queries in web search , 2009, Inf. Process. Manag..

[57]  Emine Yilmaz,et al.  The maximum entropy method for analyzing retrieval measures , 2005, SIGIR '05.

[58]  Ricardo A. Baeza-Yates,et al.  Query Recommendation Using Query Logs in Search Engines , 2004, EDBT Workshops.

[59]  Dorit S. Hochba,et al.  Approximation Algorithms for NP-Hard Problems , 1997, SIGA.

[60]  Filip Radlinski,et al.  Improving personalized web search using result diversification , 2006, SIGIR.

[61]  Rodrygo L. T. Santos,et al.  Information Retrieval on the Blogosphere , 2012, Found. Trends Inf. Retr..

[62]  M. de Rijke,et al.  Result diversification based on query-specific cluster ranking , 2011, J. Assoc. Inf. Sci. Technol..

[63]  Filip Radlinski,et al.  Redundancy, diversity and interdependent document relevance , 2009, SIGF.

[64]  Wei Zheng,et al.  Exploiting concept hierarchy for result diversification , 2012, CIKM.

[65]  Tie-Yan Liu,et al.  Future directions in learning to rank , 2010, Yahoo! Learning to Rank Challenge.

[66]  Gianni Amati,et al.  Probability models for information retrieval based on divergence from randomness , 2003 .

[67]  Krishna Bharat,et al.  Diversifying web search results , 2010, WWW '10.

[68]  Craig MacDonald,et al.  Intent-aware search result diversification , 2011, SIGIR.

[69]  W. Bruce Croft,et al.  Term level search result diversification , 2013, SIGIR.

[70]  Michael R. Lyu,et al.  Diversifying Query Suggestion Results , 2010, AAAI.

[71]  Justin Zobel,et al.  Redundant documents and search effectiveness , 2005, CIKM '05.

[72]  In-Ho Kang,et al.  Query type classification for web document retrieval , 2003, SIGIR.

[73]  John D. Lafferty,et al.  A risk minimization framework for information retrieval , 2006, Inf. Process. Manag..

[74]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[75]  Cyril Cleverdon,et al.  The Cranfield tests on index language devices , 1997 .

[76]  C. J. van Rijsbergen,et al.  The geometry of information retrieval , 2004 .

[77]  William S. Cooper,et al.  Some inconsistencies and misidentified modeling assumptions in probabilistic information retrieval , 1995, TOIS.

[78]  Nick Craswell,et al.  An experimental comparison of click position-bias models , 2008, WSDM '08.

[79]  Ben Carterette,et al.  Preference based evaluation measures for novelty and diversity , 2013, SIGIR.

[80]  Stephen E. Robertson,et al.  Probabilistic models of indexing and searching , 1980, SIGIR '80.

[81]  Chris Buckley Why current IR engines fail , 2004, SIGIR '04.

[82]  Sriram Raghavan,et al.  Searching the Web , 2001, ACM Trans. Internet Techn..

[83]  Craig MacDonald,et al.  On the role of novelty for search result diversification , 2011, Information Retrieval.

[84]  Ji-Rong Wen,et al.  Multi-dimensional search result diversification , 2011, WSDM '11.

[85]  Gianni Amati,et al.  Frequentist and Bayesian Approach to Information Retrieval , 2006, ECIR.

[86]  Samir Khuller,et al.  The Budgeted Maximum Coverage Problem , 1999, Inf. Process. Lett..

[87]  Jayant Madhavan,et al.  Identifying Aspects for Web-Search Queries , 2011, J. Artif. Intell. Res..

[88]  Reiner Kraft,et al.  Mining anchor text for query refinement , 2004, WWW '04.

[89]  David R. Karger,et al.  Less is More Probabilistic Models for Retrieving Fewer Relevant Documents , 2006 .

[90]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[91]  Christopher Olston,et al.  Search result diversity for informational queries , 2011, WWW.

[92]  Hermann Ney,et al.  Jointly optimising relevance and diversity in image retrieval , 2009, CIVR '09.

[93]  David Vallet,et al.  Crowdsourced Evaluation of Personalization and Diversi- fication Techniques in Web Search , 2011 .

[94]  Stephen P. Harter,et al.  A probabilistic approach to automatic keyword indexing. Part II. An algorithm for probabilistic indexing , 1975, J. Am. Soc. Inf. Sci..

[95]  Francesco Bonchi,et al.  Query suggestions using query-flow graphs , 2009, WSCD '09.

[96]  Craig MacDonald,et al.  Exploiting query reformulations for web search result diversification , 2010, WWW '10.

[97]  Yiqun Liu,et al.  Overview of the NTCIR-10 INTENT-2 Task , 2013, NTCIR.

[98]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[99]  Xiaojin Zhu,et al.  Improving Diversity in Ranking using Absorbing Random Walks , 2007, NAACL.

[100]  Craig MacDonald,et al.  Learning to Select a Ranking Function , 2010, ECIR.

[101]  Francesco Bonchi,et al.  From "Dango" to "Japanese Cakes": Query Reformulation Models and Patterns , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[102]  Ben Carterette,et al.  An analysis of NP-completeness in novelty and diversity ranking , 2009, Information Retrieval.

[103]  Stephen E. Robertson,et al.  Simple Evaluation Metrics for Diversified Search Results , 2010, EVIA@NTCIR.

[104]  Stephen E. Robertson,et al.  Ambiguous requests: implications for retrieval tests, systems and theories , 2007, SIGF.

[105]  W. Bruce Croft,et al.  Uncertainty in Information Retrieval Systems , 1996, Uncertainty Management in Information Systems.

[106]  Tetsuya Sakai,et al.  Alternatives to Bpref , 2007, SIGIR.

[107]  Massimo Melucci,et al.  Contextual Search: A Computational Framework , 2012, Found. Trends Inf. Retr..

[108]  Fan Zhang,et al.  Mining subtopics from text fragments for a web query , 2013, Information Retrieval.

[109]  Nivio Ziviani,et al.  Discovering Search Engine Related Queries Using Association Rules , 2003, J. Web Eng..

[110]  Craig MacDonald,et al.  Modelling efficient novelty-based search result diversification in metric spaces , 2013, J. Discrete Algorithms.

[111]  Michael D. Gordon,et al.  When Is the Probability Ranking Principle Suboptimal? , 1992, J. Am. Soc. Inf. Sci..

[112]  Aristides Gionis,et al.  Improving recommendation for long-tail queries via templates , 2011, WWW.

[113]  John G. Kemeny,et al.  Finite Markov Chains. , 1960 .

[114]  Nattiya Kanhabua,et al.  Leveraging Dynamic Query Subtopics for Time-Aware Search Result Diversification , 2014, ECIR.

[115]  Min Wang,et al.  Leveraging integrated information to extract query subtopics for search result diversification , 2013, Information Retrieval.

[116]  Daniel E. Rose,et al.  Understanding user goals in web search , 2004, WWW '04.

[117]  Ben Carterette,et al.  Probabilistic models of ranking novel documents for faceted topic retrieval , 2009, CIKM.

[118]  Jian-Yun Nie,et al.  Diversified query expansion using conceptnet , 2013, CIKM.

[119]  Gerhard Friedrich,et al.  Recommender Systems - An Introduction , 2010 .

[120]  Donna K. Harman,et al.  Overview of the Second Text REtrieval Conference (TREC-2) , 1994, HLT.

[121]  Hugo Zaragoza,et al.  The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[122]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[123]  Saul Vargas,et al.  Intent-oriented diversity in recommender systems , 2011, SIGIR.

[124]  Craig MacDonald,et al.  Aggregated Search Result Diversification , 2011, ICTIR.

[125]  Gianluca Demartini,et al.  ARES: A Retrieval Engine Based on Sentiments - Sentiment-Based Search Result Annotation and Diversification , 2011, ECIR.

[126]  Kevin S. McCurley,et al.  Analysis of anchor text for web search , 2003, SIGIR.

[127]  Craig MacDonald,et al.  Modelling Relevance towards Multiple Inclusion Criteria when Ranking Patients. , 2014, CIKM.

[128]  Olivier Chapelle,et al.  Expected reciprocal rank for graded relevance , 2009, CIKM.

[129]  Peter Fankhauser,et al.  DivQ: diversification for keyword search over structured databases , 2010, SIGIR.

[130]  Charles L. A. Clarke,et al.  Increasing evaluation sensitivity to diversity , 2012, Information Retrieval.

[131]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[132]  Charles L. A. Clarke,et al.  Overview of the TREC 2010 Web Track , 2010, TREC.

[133]  Doug Downey,et al.  Heads and tails: studies of web search with common and rare queries , 2007, SIGIR.

[134]  Charles L. A. Clarke,et al.  A comparative analysis of cascade measures for novelty and diversity , 2011, WSDM '11.

[135]  Justin Zobel,et al.  How reliable are the results of large-scale information retrieval experiments? , 1998, SIGIR '98.

[136]  ChengXiang Zhai,et al.  A study of methods for negative relevance feedback , 2008, SIGIR '08.

[137]  Rodrygo L. T. Santos Explicit web search result diversification , 2013, SIGF.

[138]  W. Bruce Croft,et al.  Query reformulation using anchor text , 2010, WSDM '10.

[139]  Filip Radlinski,et al.  Metrics for assessing sets of subtopics , 2010, SIGIR '10.

[140]  Aristides Gionis,et al.  The query-flow graph: model and applications , 2008, CIKM '08.

[141]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[142]  Craig MacDonald,et al.  Intent models for contextualising and diversifying query suggestions , 2013, CIKM.

[143]  Charles L. A. Clarke,et al.  Overview of the TREC 2012 Web Track , 2012, TREC.

[144]  Stephen P. Harter,et al.  A probabilistic approach to automatic keyword indexing. Part I. On the Distribution of Specialty Words in a Technical Literature , 1975, J. Am. Soc. Inf. Sci..

[145]  Ralf Krestel,et al.  Diversifying Product Review Rankings: Getting the Full Picture , 2011, 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[146]  Tetsuya Sakai,et al.  Evaluating evaluation metrics based on the bootstrap , 2006, SIGIR.

[147]  Paul Over,et al.  TREC-7 Interactive Track Report , 1998, TREC.

[148]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[149]  Sreenivas Gollapudi,et al.  An axiomatic approach for result diversification , 2009, WWW '09.

[150]  Tetsuya Sakai The Unreusability of Diversified Search Test Collections , 2013, EVIA@NTCIR.

[151]  Rakesh V. Vohra,et al.  A Probabilistic Analysis of the Maximal Covering Location Problem , 1993, Discret. Appl. Math..

[152]  Murat Dundar,et al.  Learning Classifiers When the Training Data Is Not IID , 2007, IJCAI.

[153]  Milad Shokouhi,et al.  From federated to aggregated search , 2010, SIGIR.

[154]  Michael D. Gordon,et al.  A utility theoretic examination of the probability ranking principle in information retrieval , 1991, J. Am. Soc. Inf. Sci..

[155]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[156]  Emre Velipasaoglu,et al.  Intent-based diversification of web search results: metrics and algorithms , 2011, Information Retrieval.

[157]  Monika Henzinger,et al.  Analysis of a very large web search engine query log , 1999, SIGF.

[158]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[159]  Djoerd Hiemstra,et al.  A Linguistically Motivated Probabilistic Model of Information Retrieval , 1998, ECDL.

[160]  Rodrygo L. T. Santos,et al.  Diversifying for Multiple Information Needs , 2011 .

[161]  Mark Sanderson,et al.  Test Collection Based Evaluation of Information Retrieval Systems , 2010, Found. Trends Inf. Retr..

[162]  M. de Rijke,et al.  Fusion helps diversification , 2014, SIGIR.

[163]  Prasenjit Mitra,et al.  Query suggestions in the absence of query logs , 2011, SIGIR.

[164]  Jiayu Tang,et al.  Generic and Spatial Approaches to Image Search Results Diversification , 2009, ECIR.

[165]  Filip Radlinski,et al.  Inferring query intent from reformulations and clicks , 2010, WWW '10.

[166]  lawa Kanas,et al.  Metric Spaces , 2020, An Introduction to Functional Analysis.

[167]  Arjen P. de Vries,et al.  Combining implicit and explicit topic representations for result diversification , 2012, SIGIR '12.

[168]  Stephen E. Robertson,et al.  Microsoft Cambridge at TREC 13: Web and Hard Tracks , 2004, TREC.

[169]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[170]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[171]  Pia Borlund,et al.  The concept of relevance in IR , 2003, J. Assoc. Inf. Sci. Technol..

[172]  Yiqun Liu,et al.  Overview of the NTCIR-9 INTENT Task , 2011, NTCIR.

[173]  Craig MacDonald,et al.  Sparse Spatial Selection for Novelty-Based Search Result Diversification , 2011, SPIRE.

[174]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[175]  Pablo Castells,et al.  Personalized diversification of search results , 2012, SIGIR '12.

[176]  Craig MacDonald,et al.  Explicit Search Result Diversification through Sub-queries , 2010, ECIR.

[177]  Amanda Spink,et al.  Real life information retrieval: a study of user queries on the Web , 1998, SIGF.

[178]  Nick Craswell,et al.  Random walks on the click graph , 2007, SIGIR.

[179]  W. Bruce Croft,et al.  Inferring query aspects from reformulations using clustering , 2011, CIKM '11.

[180]  Amanda Spink,et al.  Real life, real users, and real needs: a study and analysis of user queries on the web , 2000, Inf. Process. Manag..

[181]  Alistair Moffat,et al.  Rank-biased precision for measurement of retrieval effectiveness , 2008, TOIS.

[182]  Albert N. Link,et al.  Economic impact assessment of NIST's text REtrieval conference (TREC) program. Final report , 2010 .

[183]  Paul Over,et al.  TREC-6 Interactive Report , 1997, TREC.

[184]  Cyril W. Cleverdon,et al.  The significance of the Cranfield tests on index languages , 1991, SIGIR '91.

[185]  P. W. Jones,et al.  Bandit Problems, Sequential Allocation of Experiments , 1987 .

[186]  Sihem Amer-Yahia,et al.  Efficient Computation of Diverse Query Results , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[187]  Fabrizio Silvestri,et al.  Mining Query Logs: Turning Search Usage Data into Knowledge , 2010, Found. Trends Inf. Retr..

[188]  Craig MacDonald,et al.  Selectively diversifying web search results , 2010, CIKM.

[189]  Charles L. A. Clarke,et al.  An Effectiveness Measure for Ambiguous and Underspecified Queries , 2009, ICTIR.

[190]  Doug Beeferman,et al.  Agglomerative clustering of a search engine query log , 2000, KDD '00.

[191]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[192]  Craig MacDonald,et al.  Selecting effective expansion terms for diversity , 2013, OAIR.

[193]  Ellen M. Voorhees,et al.  TREC: Continuing information retrieval's tradition of experimentation , 2007, CACM.

[194]  Mark Sanderson,et al.  Do user preferences and evaluation measures line up? , 2010, SIGIR.

[195]  Joemon M. Jose,et al.  A comprehensive analysis of parameter settings for novelty-biased cumulative gain , 2012, CIKM '12.

[196]  B. Nordstrom FINITE MARKOV CHAINS , 2005 .

[197]  ChengXiang Zhai,et al.  Mining term association patterns from search logs for effective query reformulation , 2008, CIKM '08.

[198]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[199]  Tetsuya Sakai,et al.  Diversified search evaluation: lessons from the NTCIR-9 INTENT task , 2012, Information Retrieval.

[200]  Ben Carterette,et al.  Robust test collections for retrieval evaluation , 2007, SIGIR.

[201]  Jun Wang,et al.  Top-k Retrieval Using Facility Location Analysis , 2012, ECIR.

[202]  Ellen M. Voorhees,et al.  TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing) , 2005 .