Learning to merge search results for efficient Distributed Information Retrieval

Merging search results from different servers is a major problem in Distributed Information Retrieval. We used Regression-SVM and Ranking-SVM which would learn a function that merges results based on information that is readily available: i.e. the ranks, titles, summaries and URLs contained in the results pages. By not downloading additional information, such as the full document, we decrease bandwidth usage. CORI and Round Robin merging were used as our baselines; surprisingly, our results show that the SVM-methods do not improve over those baselines.

[1]  James Allan,et al.  A comparison of statistical significance tests for information retrieval evaluation , 2007, CIKM '07.

[2]  Hugh E. Williams,et al.  Fast generation of result snippets in web search , 2007, SIGIR.

[3]  Milad Shokouhi,et al.  Central-Rank-Based Collection Selection in Uncooperative Distributed Information Retrieval , 2007, ECIR.

[4]  José Luis Vicedo González,et al.  TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[5]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[6]  Jamie Callan,et al.  DISTRIBUTED INFORMATION RETRIEVAL , 2002 .

[7]  Jacques Savoy,et al.  Approaches to collection selection and results merging for distributed information retrieval , 2001, CIKM '01.

[8]  Luo Si,et al.  A semisupervised learning method to merge search engine results , 2003, TOIS.

[9]  James P. Callan,et al.  Query-based sampling of text databases , 2001, TOIS.

[10]  Georgios Paltoglou,et al.  Results Merging Algorithm Using Multiple Regression Models , 2007, ECIR.

[11]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[12]  Peter Bailey,et al.  Engineering a multi-purpose test collection for Web retrieval experiments , 2003, Inf. Process. Manag..

[13]  Milad Shokouhi,et al.  Robust result merging using sample-based score estimates , 2009, TOIS.

[14]  Luo Si,et al.  A language modeling framework for resource selection and results merging , 2002, CIKM '02.

[15]  Ricardo A. Baeza-Yates,et al.  Challenges on Distributed Web Retrieval , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[16]  W. Bruce Croft,et al.  Searching distributed collections with inference networks , 1995, SIGIR '95.