论文信息 - Efficient diversity-aware search - 字舞流文

Efficient diversity-aware search

Typical approaches of ranking information in response to a user's query that return the most relevant results ignore important factors contributing to user satisfaction; for instance, the contents of a result document may be redundant given the results already examined. Motivated by emerging applications, in this work we study the problem of Diversity-Aware Search, the essence of which is ranking search results based on both their relevance, as well as their dissimilarity to other results reported. Diversity-Aware Search is generally a hard problem, and even tractable instances thereof cannot be efficiently solved by adapting existing approaches. We propose DIVGEN, an efficient algorithm for diversity-aware search, which achieves significant performance improvements via novel data access primitives. Although selecting the optimal schedule of data accesses is a hard problem, we devise the first low-overhead data access prioritization scheme with theoretical quality guarantees, and good performance in practice. A comprehensive evaluation on real and synthetic large-scale corpora demonstrates the efficiency and effectiveness of our approach.

Nick Koudas | Albert Angel | A. Angel | Nick Koudas

[1] Yi Zhang,et al. Novelty and redundancy detection in adaptive filtering , 2002, SIGIR '02.

[2] S. Thomas McCormick,et al. Submodular Function Minimization , 2005 .

[3] Thomas Roelleke,et al. TF-IDF uncovered: a study of theories and probabilities , 2008, SIGIR '08.

[4] Surajit Chaudhuri,et al. Ranking objects based on relationships and fixed associations , 2009, EDBT '09.

[5] Divesh Srivastava,et al. What's on the grapevine? , 2009, SIGMOD Conference.

[6] ChengXiang Zhai,et al. Risk minimization and language modeling in text retrieval dissertation abstract , 2002, SIGF.

[7] Jayant R. Haritsa,et al. Providing Diversity in K-Nearest Neighbor Query Results , 2003, PAKDD.

[8] M. L. Fisher,et al. An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[9] Sreenivas Gollapudi,et al. Diversifying search results , 2009, WSDM '09.

[10] Olivier Chapelle,et al. Expected reciprocal rank for graded relevance , 2009, CIKM.

[11] RadlinskiFilip,et al. Redundancy, diversity and interdependent document relevance , 2009 .

[12] Takeshi Tokuyama,et al. Dense subgraph problems with output-density conditions , 2005, TALG.

[13] Sean M. McNee,et al. Improving recommendation lists through topic diversification , 2005, WWW '05.

[14] Filip Radlinski,et al. Improving personalized web search using result diversification , 2006, SIGIR.

[15] Filip Radlinski,et al. Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search , 2007, TOIS.

[16] W. Feller,et al. An Introduction to Probability Theory and Its Applications , 1951 .

[17] Filip Radlinski,et al. Redundancy, diversity and interdependent document relevance , 2009, SIGF.

[18] Divesh Srivastava,et al. Efficient identification of coupled entities in document collections , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[19] Gerhard Weikum,et al. IO-Top-k: index-access optimized top-k query processing , 2006, VLDB.

[20] Douglas E. Appelt,et al. Introduction to Information Extraction , 1999, AI Commun..

[21] Sihem Amer-Yahia,et al. Efficient Computation of Diverse Query Results , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[22] Jade Goldstein-Stewart,et al. The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[23] Cong Yu,et al. It takes variety to make a world: diversification in recommender systems , 2009, EDBT '09.

[24] Dafna Shahaf,et al. Turning down the noise in the blogosphere , 2009, KDD.

[25] Charles L. A. Clarke,et al. Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[26] Sreenivas Gollapudi,et al. An axiomatic approach for result diversification , 2009, WWW '09.

[27] D. Gibson,et al. Redundancy , 1984 .

[28] Hua Li,et al. Improving web search results using affinity graph , 2005, SIGIR '05.

[29] Hinrich Schütze,et al. Introduction to information retrieval , 2008 .

[30] David R. Karger,et al. Less is More Probabilistic Models for Retrieving Fewer Relevant Documents , 2006 .

[31] Panagiotis G. Ipeirotis. Demographics of Mechanical Turk , 2010 .

[32] Moni Naor,et al. Optimal aggregation algorithms for middleware , 2001, PODS.

[33] Feller William,et al. An Introduction To Probability Theory And Its Applications , 1950 .

[34] Xiaojin Zhu,et al. Improving Diversity in Ranking using Absorbing Random Walks , 2007, NAACL.

[35] Yehoshua Sagiv,et al. Keyword proximity search in complex data graphs , 2008, SIGMOD Conference.