On the Selection of the Best Retrieval Result Per Query - An Alternative Approach to Data Fusion

Some recent works have shown that the "perfect" selection of the best IR system per query could lead to a significant improvement on the retrieval performance. Motivated by this fact, in this paper we focus on the automatic selection of the best retrieval result from a given set of results lists generated by different IR systems. In particular, we propose five heuristic measures for evaluating the relative relevance of each result list, which take into account the redundancy and ranking of documents across the lists. Preliminary results in three different data sets, and considering 216 queries, are encouraging. They show that the proposed approach could slightly outperform the results from the best individual IR system in two out of three collections, but that it could significantly improve the average results of individual systems from all data sets. In addition, the achieved results indicate that our approach is a competitive alternative to traditional data fusion methods.

[1]  Paul B. Kantor,et al.  Predicting the effectiveness of naïve data fusion on the basis of system characteristics , 2000, J. Am. Soc. Inf. Sci..

[2]  Josiane Mothe,et al.  Query clustering to decide the best system to use , 2007, RIAO.

[3]  Hugo Jair Escalante,et al.  TIA-INAOE's Participation at ImageCLEF 2007 , 2007 .

[4]  Garrison W. Cottrell,et al.  Fusion Via a Linear Combination of Scores , 1999, Information Retrieval.

[5]  Bogdan Sacaleanu,et al.  Working Notes for the CLEF 2008 Workshop , 2008 .

[6]  Garrison W. Cottrell,et al.  Automatic combination of multiple ranked retrieval systems , 1994, SIGIR '94.

[7]  E. A. Fox,et al.  Combining the Evidence of Multiple Query Representations for Information Retrieval , 1995, Inf. Process. Manag..

[8]  Carol Peters,et al.  CLEF 2008: Ad Hoc Track Overview , 2008, CLEF.

[9]  Stéphane Marchand-Maillet,et al.  Information Fusion in Multimedia Information Retrieval , 2007, Adaptive Multimedia Retrieval.

[10]  Ellen M. Voorhees,et al.  The Sixteenth Text REtrieval Conference (TREC 2007) | NIST , 2008 .

[11]  Carol Peters Cross-Language Evaluation Forum - CLEF 2008 , 2008 .

[12]  Carol Peters What Happened in CLEF 2007 , 2007, CLEF.

[13]  Mark D. Smucker,et al.  Information Retrieval , 2017, Lecture Notes in Computer Science.

[14]  Carol Peters What happened in CLEF 2008: Introduction to the Working Notes , 2004, CLEF.

[15]  D. Frank Hsu,et al.  Comparing Rank and Score Combination Methods for Data Fusion in Information Retrieval , 2005, Information Retrieval.

[16]  Rabia Nuray-Turan,et al.  Automatic ranking of information retrieval systems using data fusion , 2006, Inf. Process. Manag..

[17]  Anselm Spoerri,et al.  Using the structure of overlap between search results to rank retrieval systems without relevance judgments , 2007, Inf. Process. Manag..

[18]  Ellen M. Voorhees,et al.  Overview of TREC 2007 , 2007, TREC.

[19]  Luis Alfonso Ureña López,et al.  A merging strategy proposal: The 2-step retrieval status value method , 2005, Information Retrieval.

[20]  Fredric C. Gey,et al.  GeoCLEF 2008: the CLEF 2008 Cross-Language Geographic Information Retrieval Track Overview , 2008, CLEF.

[21]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[22]  Jong-Hak Lee,et al.  Analyses of multiple evidence combination , 1997, SIGIR '97.

[23]  Shengli Wu,et al.  Methods for ranking information retrieval systems without relevance judgments , 2003, SAC '03.

[24]  Manuel Montes-y-Gómez,et al.  INAOE at GeoCLEF 2008: A Ranking Approach based on Sample Documents , 2008, CLEF.

[25]  Shengli Wu,et al.  Performance prediction of data fusion for information retrieval , 2006, Inf. Process. Manag..

[26]  Josiane Mothe,et al.  Relevance Feedback as an Indicator to Select the Best Search Engine - Evaluation on TREC Data , 2007, ICEIS.

[27]  C. Shahabi,et al.  Two-Phase Decision Fusion Based On User Preference ∗ , 2003 .