Efficient diversity-aware search

Typical approaches of ranking information in response to a user's query that return the most relevant results ignore important factors contributing to user satisfaction; for instance, the contents of a result document may be redundant given the results already examined. Motivated by emerging applications, in this work we study the problem of Diversity-Aware Search, the essence of which is ranking search results based on both their relevance, as well as their dissimilarity to other results reported. Diversity-Aware Search is generally a hard problem, and even tractable instances thereof cannot be efficiently solved by adapting existing approaches. We propose DIVGEN, an efficient algorithm for diversity-aware search, which achieves significant performance improvements via novel data access primitives. Although selecting the optimal schedule of data accesses is a hard problem, we devise the first low-overhead data access prioritization scheme with theoretical quality guarantees, and good performance in practice. A comprehensive evaluation on real and synthetic large-scale corpora demonstrates the efficiency and effectiveness of our approach.

[1]  Yi Zhang,et al.  Novelty and redundancy detection in adaptive filtering , 2002, SIGIR '02.

[2]  S. Thomas McCormick,et al.  Submodular Function Minimization , 2005 .

[3]  Thomas Roelleke,et al.  TF-IDF uncovered: a study of theories and probabilities , 2008, SIGIR '08.

[4]  Surajit Chaudhuri,et al.  Ranking objects based on relationships and fixed associations , 2009, EDBT '09.

[5]  Divesh Srivastava,et al.  What's on the grapevine? , 2009, SIGMOD Conference.

[6]  ChengXiang Zhai,et al.  Risk minimization and language modeling in text retrieval dissertation abstract , 2002, SIGF.

[7]  Jayant R. Haritsa,et al.  Providing Diversity in K-Nearest Neighbor Query Results , 2003, PAKDD.

[8]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[9]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[10]  Olivier Chapelle,et al.  Expected reciprocal rank for graded relevance , 2009, CIKM.

[11]  RadlinskiFilip,et al.  Redundancy, diversity and interdependent document relevance , 2009 .

[12]  Takeshi Tokuyama,et al.  Dense subgraph problems with output-density conditions , 2005, TALG.

[13]  Sean M. McNee,et al.  Improving recommendation lists through topic diversification , 2005, WWW '05.

[14]  Filip Radlinski,et al.  Improving personalized web search using result diversification , 2006, SIGIR.

[15]  Filip Radlinski,et al.  Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search , 2007, TOIS.

[16]  W. Feller,et al.  An Introduction to Probability Theory and Its Applications , 1951 .

[17]  Filip Radlinski,et al.  Redundancy, diversity and interdependent document relevance , 2009, SIGF.

[18]  Divesh Srivastava,et al.  Efficient identification of coupled entities in document collections , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[19]  Gerhard Weikum,et al.  IO-Top-k: index-access optimized top-k query processing , 2006, VLDB.

[20]  Douglas E. Appelt,et al.  Introduction to Information Extraction , 1999, AI Commun..

[21]  Sihem Amer-Yahia,et al.  Efficient Computation of Diverse Query Results , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[22]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[23]  Cong Yu,et al.  It takes variety to make a world: diversification in recommender systems , 2009, EDBT '09.

[24]  Dafna Shahaf,et al.  Turning down the noise in the blogosphere , 2009, KDD.

[25]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[26]  Sreenivas Gollapudi,et al.  An axiomatic approach for result diversification , 2009, WWW '09.

[27]  D. Gibson,et al.  Redundancy , 1984 .

[28]  Hua Li,et al.  Improving web search results using affinity graph , 2005, SIGIR '05.

[29]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[30]  David R. Karger,et al.  Less is More Probabilistic Models for Retrieving Fewer Relevant Documents , 2006 .

[31]  Panagiotis G. Ipeirotis Demographics of Mechanical Turk , 2010 .

[32]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[33]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[34]  Xiaojin Zhu,et al.  Improving Diversity in Ranking using Absorbing Random Walks , 2007, NAACL.

[35]  Yehoshua Sagiv,et al.  Keyword proximity search in complex data graphs , 2008, SIGMOD Conference.