Fusion helps diversification

A popular strategy for search result diversification is to first retrieve a set of documents utilizing a standard retrieval method and then rerank the results. We adopt a different perspective on the problem, based on data fusion. Starting from the hypothesis that data fusion can improve performance in terms of diversity metrics, we examine the impact of standard data fusion methods on result diversification. We take the output of a set of rankers, optimized for diversity or not, and find that data fusion can significantly improve state-of-the art diversification methods. We also introduce a new data fusion method, called diversified data fusion, which infers latent topics of a query using topic modeling, without leveraging outside information. Our experiments show that data fusion methods can enhance the performance of diversification and DDF significantly outperforms existing data fusion methods in terms of diversity metrics.

[1]  M. de Rijke,et al.  Personalized time-aware tweets summarization , 2013, SIGIR.

[2]  David R. Karger,et al.  Less is More Probabilistic Models for Retrieving Fewer Relevant Documents , 2006 .

[3]  Maarten de Rijke,et al.  Finding knowledgeable groups in enterprise corpora , 2013, SIGIR.

[4]  Sihem Amer-Yahia,et al.  Real-time recommendation of diverse related articles , 2013, WWW.

[5]  Mohamed Farah,et al.  An outranking approach for rank aggregation in information retrieval , 2007, SIGIR.

[6]  Craig MacDonald,et al.  Intent-aware search result diversification , 2011, SIGIR.

[7]  Wei Li,et al.  Pachinko allocation: DAG-structured mixture models of topic correlations , 2006, ICML.

[8]  Dan Wu,et al.  Toward a Robust data fusion for document retrieval , 2008, 2008 International Conference on Natural Language Processing and Knowledge Engineering.

[9]  M. de Rijke,et al.  Late Data Fusion for Microblog Search , 2013, ECIR.

[10]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[11]  Jimeng Sun,et al.  Dynamic Mixture Models for Multiple Time-Series , 2007, IJCAI.

[12]  Jun S. Liu,et al.  The Collapsed Gibbs Sampler in Bayesian Computations with Applications to a Gene Regulation Problem , 1994 .

[13]  Craig MacDonald,et al.  Exploiting query reformulations for web search result diversification , 2010, WWW '10.

[14]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[15]  Shengli Wu,et al.  Data Fusion in Information Retrieval , 2012, Adaptation, Learning, and Optimization.

[16]  Milad Shokouhi,et al.  LambdaMerge: merging the results of query reformulations , 2011, WSDM '11.

[17]  Craig MacDonald,et al.  University of Glasgow at TREC 2012: Experiments with Terrier in Medical Records, Microblog, and Web Tracks , 2012, TREC.

[18]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.

[19]  Craig MacDonald,et al.  Explicit Search Result Diversification through Sub-queries , 2010, ECIR.

[20]  Javed A. Aslam,et al.  Models for metasearch , 2001, SIGIR '01.

[21]  Charles L. A. Clarke,et al.  Overview of the TREC 2011 Web Track , 2011, TREC.

[22]  James Allan,et al.  Sentiment diversification with different biases , 2013, SIGIR.

[23]  Charles L. A. Clarke,et al.  Overview of the TREC 2012 Web Track , 2012, TREC.

[24]  Qiang Yang,et al.  Transferring topical knowledge from auxiliary long texts for short text clustering , 2011, CIKM '11.

[25]  W. Bruce Croft,et al.  LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[26]  Charles L. A. Clarke,et al.  Overview of the TREC 2011 Web Track | NIST , 2011 .

[27]  Miles Efron,et al.  Information search and retrieval in microblogs , 2011, J. Assoc. Inf. Sci. Technol..

[28]  Charles L. A. Clarke,et al.  The impact of intent selection on diversified search evaluation , 2013, SIGIR.

[29]  Oren Kurland,et al.  Cluster-based fusion of retrieved lists , 2011, SIGIR.

[30]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Tomoharu Iwata,et al.  Geo topic model: joint modeling of user's activity area and interests for location recommendation , 2013, WSDM.

[32]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[33]  M. de Rijke,et al.  The Impact of Semantic Document Expansion on Cluster-Based Fusion for Microblog Search , 2014, ECIR.

[34]  Jong-Hak Lee,et al.  Analyses of multiple evidence combination , 1997, SIGIR '97.

[35]  Idan Szpektor,et al.  When relevance is not enough: promoting diversity and freshness in personalized question recommendation , 2013, WWW.

[36]  Craig MacDonald,et al.  Voting for candidates: adapting data fusion techniques for an expert search task , 2006, CIKM '06.

[37]  Charles L. A. Clarke,et al.  Overview of the TREC 2010 Web Track , 2010, TREC.

[38]  W. Bruce Croft,et al.  Diversity by proportionality: an election-based approach to search result diversification , 2012, SIGIR '12.

[39]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[40]  Joon Ho Lee,et al.  Combining multiple evidence from different properties of weighting schemes , 1995, SIGIR '95.

[41]  Arjen P. de Vries,et al.  Combining implicit and explicit topic representations for result diversification , 2012, SIGIR '12.

[42]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[43]  W. Bruce Croft,et al.  Term level search result diversification , 2013, SIGIR.

[44]  Saul Vargas,et al.  Explicit relevance models in intent-oriented information retrieval diversification , 2012, SIGIR '12.

[45]  Xiaoyan Zhu,et al.  Sentiment Analysis with Global Topics and Local Dependency , 2010, AAAI.

[46]  Yang Zhang,et al.  Modeling user posting behavior on social media , 2012, SIGIR '12.

[47]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[48]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.