The early fusion strategy for search result diversification

A typical strategy for search result diversification is a two-stage process: first we use a traditional search engine to obtain a ranked list of documents, in which relevance is the only concern; then the results are re-ranked so as to promote diversity. In recent years, some researchers have investigated how to use data fusion to improve search result diversity. Corresponding to the two stages of search result diversification, we may apply data fusion at either of these two stages. All previous investigations focus on fusing results at the second stage, or fusing multiple results that are already diversified. In this paper, we investigate an alternative way of fusion, or fusing multiple results at the first stage. The fused results are diversified by a re-ranking algorithm. Experiments are carried out with three groups of results submitted to the TREC web adhoc task. We find that the proposed alternative is very good. Its performance is slightly better compared with the second stage fusion. Another advantage is it can be implemented more efficiently than the second stage fusion.

[1]  W. Bruce Croft,et al.  Diversity by proportionality: an election-based approach to search result diversification , 2012, SIGIR '12.

[2]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[3]  Craig MacDonald,et al.  Explicit Search Result Diversification through Sub-queries , 2010, ECIR.

[4]  W. Bruce Croft,et al.  Term level search result diversification , 2013, SIGIR.

[5]  Ismail Sengör Altingövde,et al.  Explicit search result diversification using score and rank aggregation methods , 2015, J. Assoc. Inf. Sci. Technol..

[6]  John D. Lafferty,et al.  Beyond independent relevance: methods and evaluation metrics for subtopic retrieval , 2003, SIGIR.

[7]  Shengli Wu,et al.  Search result diversification via data fusion , 2014, SIGIR.

[8]  M. de Rijke,et al.  Result diversification based on query-specific cluster ranking , 2011, J. Assoc. Inf. Sci. Technol..

[9]  Xueqi Cheng,et al.  Learning for search result diversification , 2014, SIGIR.

[10]  Hong Cheng,et al.  Coverage-based search result diversification , 2012, Information Retrieval.

[11]  Jade Goldstein-Stewart,et al.  The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries , 1998, SIGIR Forum.

[12]  Charles L. A. Clarke,et al.  Reciprocal rank fusion outperforms condorcet and individual rank learning methods , 2009, SIGIR.

[13]  Ben Carterette,et al.  Probabilistic models of ranking novel documents for faceted topic retrieval , 2009, CIKM.

[14]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[15]  Shengli Wu,et al.  Data Fusion in Information Retrieval , 2012, Adaptation, Learning, and Optimization.

[16]  Edward A. Fox,et al.  Combining Evidence from Multiple Searches , 1992, TREC.

[17]  Garrison W. Cottrell,et al.  Automatic combination of multiple ranked retrieval systems , 1994, SIGIR '94.

[18]  Javed A. Aslam,et al.  Models for metasearch , 2001, SIGIR '01.

[19]  M. de Rijke,et al.  Fusion helps diversification , 2014, SIGIR.

[20]  Jun Wang,et al.  Portfolio theory of information retrieval , 2009, SIGIR.

[21]  Mark Sanderson,et al.  Using score differences for search result diversification , 2014, SIGIR.

[22]  Jong-Hak Lee,et al.  Analyses of multiple evidence combination , 1997, SIGIR '97.

[23]  Oren Kurland,et al.  Cluster-based fusion of retrieved lists , 2011, SIGIR.

[24]  Garrison W. Cottrell,et al.  Predicting the performance of linearly combined IR systems , 1998, SIGIR '98.

[25]  Javed A. Aslam,et al.  Condorcet fusion for improved retrieval , 2002, CIKM '02.