New Methods of Results Merging for Distributed Information Retrieval

In distributed information retrieval systems, document overlaps occur frequently across results from different resources. This is especially the case for meta-search engines which merge results from several web search engines. This paper addresses the problem of merging results exploiting overlaps in order to achieve better performance. New algorithms for merging results are proposed, which take advantage of the use of duplicate documents in two ways: one correlates scores from different results; the other regards duplicates as increasing evidence of being relevant to the given query. An extensive experimentation has demonstrated that these methods are effective.

[1]  Guijun Wang,et al.  ProFusion*: Intelligent Fusion from Multiple, Distributed Search Engines , 1996, J. Univers. Comput. Sci..

[2]  P. Willett,et al.  SIGIR '97 : proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Philadelphia, Pennsylvania, USA, July 27-July 31, 1997 , 1997 .

[3]  Vipin Kumar,et al.  Expert agreement and content based reranking in a meta search environment using Mearf , 2002, WWW '02.

[4]  Oren Etzioni,et al.  The MetaCrawler architecture for resource aggregation on the Web , 1997 .

[5]  Jacques Savoy,et al.  Database merging strategy based on logistic regression , 2000, Inf. Process. Manag..

[6]  Ellen M. Voorhees,et al.  Learning collection fusion strategies , 1995, SIGIR '95.

[7]  Luo Si,et al.  Using sampled data and regression to merge search engine results , 2002, SIGIR '02.

[8]  Adele E. Howe,et al.  Experiences with selecting search engines using metasearch , 1997, TOIS.

[9]  Jong-Hak Lee,et al.  Analyses of multiple evidence combination , 1997, SIGIR '97.

[10]  King-Lup Liu,et al.  Building efficient and effective metasearch engines , 2002, CSUR.

[11]  Peter Bailey,et al.  Server selection on the World Wide Web , 2000, DL '00.

[12]  Shengli Wu,et al.  Experiments with Document Archive Size Detection , 2003, ECIR.

[13]  Garrison W. Cottrell,et al.  Fusion Via a Linear Combination of Scores , 1999, Information Retrieval.

[14]  King-Lup Liu,et al.  Efficient and effective metasearch for a large number of text databases , 1999, CIKM '99.

[15]  M. I. Mauldin,et al.  Lycos: design choices in an Internet search service , 1997 .

[16]  E. Ziegel,et al.  Linear Statistical Models: An Applied Approach. , 1992 .

[17]  Adele E. Howe,et al.  SAVVYSEARCH: A Metasearch Engine That Learns Which Search Engines to Query , 1997, AI Mag..

[18]  James P. Callan,et al.  Experiments Using the Lemur Toolkit , 2001, TREC.

[19]  Oren Etzioni,et al.  Multi-Engine Search and Comparison Using the MetaCrawler , 1995, World Wide Web J..