论文信息 - Result diversification based on query-specific cluster ranking - 字舞流文

Result diversification based on query-specific cluster ranking

Result diversification is a retrieval strategy for dealing with ambiguous or multi-faceted queries by providing documents that cover as many facets of the query as possible. We propose a result diversification framework based on query-specific clustering and cluster ranking, in which diversification is restricted to documents belonging to clusters that potentially contain a high percentage of relevant documents. Empirical results show that the proposed framework improves the performance of several existing diversification methods. The framework also gives rise to a simple yet effective cluster-based approach to result diversification that selects documents from different clusters to be included in a ranked list in a round robin fashion. We describe a set of experiments aimed at thoroughly analyzing the behavior of the two main components of the proposed diversification framework, ranking and selecting clusters for diversification. Both components have a crucial impact on the overall performance of our framework, but ranking clusters plays a more important role than selecting clusters. We also examine properties that clusters should have in order for our diversification framework to be effective. Most relevant documents should be contained in a small number of high-quality clusters, while there should be no dominantly large clusters. Also, documents from these high-quality clusters should have a diverse content. These properties are strongly correlated with the overall performance of the proposed diversification framework. © 2011 Wiley Periodicals, Inc.

M. de Rijke | Maarten de Rijke | Edgar Meij | Jiyin He | Jiyin He | E. Meij

[1] Birger Larsen,et al. Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, Seattle, Washington, USA, August 6-11, 2006 , 2006 .

[2] P. Sneath,et al. Numerical Taxonomy , 1962, Nature.

[3] W. Bruce Croft,et al. A Markov random field model for term dependencies , 2005, SIGIR '05.

[4] Oren Kurland,et al. Inter-Document Similiarities, Language Models, and Ad Hoc Information Retrieval , 2006 .

[5] Bert R. Boyce,et al. Beyond topicality : A two stage view of relevance and the retrieval process , 1982, Inf. Process. Manag..

[6] W. Bruce Croft. A model of cluster searching bases on classification , 1980, Inf. Syst..

[7] W. Bruce Croft,et al. Evaluating Text Representations for Retrieval of the Best Group of Documents , 2008, ECIR.

[8] Donald Geman,et al. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .

[9] Ben Carterette,et al. Probabilistic models of ranking novel documents for faceted topic retrieval , 2009, CIKM.

[10] Robert Villa,et al. The effectiveness of query-specific hierarchic clustering in information retrieval , 2002, Inf. Process. Manag..

[11] Charles L. A. Clarke,et al. Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[12] James Allan,et al. Using part-of-speech patterns to reduce query ambiguity , 2002, SIGIR '02.

[13] David R. Karger,et al. Less is More Probabilistic Models for Retrieving Fewer Relevant Documents , 2006 .

[14] W. Bruce Croft,et al. Cluster-based retrieval using language models , 2004, SIGIR '04.

[15] M. de Rijke,et al. An effective coherence measure to determine topical consistency in user-generated content , 2009, International Journal on Document Analysis and Recognition (IJDAR).

[16] Jade Goldstein-Stewart,et al. The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[17] S. Robertson. The probability ranking principle in IR , 1997 .

[18] Oren Kurland,et al. Re-ranking search results using language models of query-specific clusters , 2009, Information Retrieval.

[19] W. Bruce Croft,et al. Representing clusters for retrieval , 2006, SIGIR.

[20] Marti A. Hearst,et al. Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.

[21] Thorsten Joachims,et al. Predicting diverse subsets using structural SVMs , 2008, ICML '08.

[22] Filip Radlinski,et al. Learning diverse rankings with multi-armed bandits , 2008, ICML '08.

[23] Donna K. Harman,et al. Overview of the Third Text REtrieval Conference (TREC-3) , 1995, TREC.

[24] C. J. van Rijsbergen,et al. The use of hierarchic clustering in information retrieval , 1971, Inf. Storage Retr..

[25] C. J. van Rijsbergen,et al. Information Retrieval , 1979, Encyclopedia of GIS.

[26] John D. Lafferty,et al. A risk minimization framework for information retrieval , 2006, Inf. Process. Manag..

[27] Guodong Zhou,et al. Document re-ranking using cluster validation and label propagation , 2006, CIKM '06.

[28] Sreenivas Gollapudi,et al. Diversifying search results , 2009, WSDM '09.

[29] Peter Willett,et al. Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[30] Craig MacDonald,et al. Exploiting query reformulations for web search result diversification , 2010, WWW '10.

[31] Claudio Carpineto,et al. Mobile information retrieval with search results clustering: Prototypes and evaluations , 2009 .

[32] Carmel Domshlak,et al. A rank-aggregation approach to searching for optimal query-specific clusters , 2008, SIGIR '08.

[33] Mark Steyvers,et al. Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[34] William Goffman,et al. A searching procedure for information retrieval , 1964, Inf. Storage Retr..

[35] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[36] Alexander Dekhtyar,et al. Information Retrieval , 2018, Lecture Notes in Computer Science.