论文信息 - Fusion helps diversification

Fusion helps diversification

A popular strategy for search result diversification is to first retrieve a set of documents utilizing a standard retrieval method and then rerank the results. We adopt a different perspective on the problem, based on data fusion. Starting from the hypothesis that data fusion can improve performance in terms of diversity metrics, we examine the impact of standard data fusion methods on result diversification. We take the output of a set of rankers, optimized for diversity or not, and find that data fusion can significantly improve state-of-the art diversification methods. We also introduce a new data fusion method, called diversified data fusion, which infers latent topics of a query using topic modeling, without leveraging outside information. Our experiments show that data fusion methods can enhance the performance of diversification and DDF significantly outperforms existing data fusion methods in terms of diversity metrics.

[1] M. de Rijke,et al. Personalized time-aware tweets summarization , 2013, SIGIR.

[2] David R. Karger,et al. Less is More Probabilistic Models for Retrieving Fewer Relevant Documents , 2006 .

[3] Maarten de Rijke,et al. Finding knowledgeable groups in enterprise corpora , 2013, SIGIR.

[4] Sihem Amer-Yahia,et al. Real-time recommendation of diverse related articles , 2013, WWW.

[5] Mohamed Farah,et al. An outranking approach for rank aggregation in information retrieval , 2007, SIGIR.

[6] Craig MacDonald,et al. Intent-aware search result diversification , 2011, SIGIR.

[7] Wei Li,et al. Pachinko allocation: DAG-structured mixture models of topic correlations , 2006, ICML.

[8] Dan Wu,et al. Toward a Robust data fusion for document retrieval , 2008, 2008 International Conference on Natural Language Processing and Knowledge Engineering.

[9] M. de Rijke,et al. Late Data Fusion for Microblog Search , 2013, ECIR.

[10] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[11] Jimeng Sun,et al. Dynamic Mixture Models for Multiple Time-Series , 2007, IJCAI.

[12] Jun S. Liu,et al. The Collapsed Gibbs Sampler in Bayesian Computations with Applications to a Gene Regulation Problem , 1994 .

[13] Craig MacDonald,et al. Exploiting query reformulations for web search result diversification , 2010, WWW '10.

[14] Jade Goldstein-Stewart,et al. The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[15] Shengli Wu,et al. Data Fusion in Information Retrieval , 2012, Adaptation, Learning, and Optimization.

[16] Milad Shokouhi,et al. LambdaMerge: merging the results of query reformulations , 2011, WSDM '11.

[17] Craig MacDonald,et al. University of Glasgow at TREC 2012: Experiments with Terrier in Medical Records, Microblog, and Web Tracks , 2012, TREC.

[18] John D. Lafferty,et al. Correlated Topic Models , 2005, NIPS.

[19] Craig MacDonald,et al. Explicit Search Result Diversification through Sub-queries , 2010, ECIR.

[20] Javed A. Aslam,et al. Models for metasearch , 2001, SIGIR '01.