Shadow document methods of resutls merging

In distributed information retrieval systems, document overlaps occur frequently across results from different databases. This is especially the case for meta-search engines which merge results from several general-purpose web search engines. This paper addresses the problem of merging results which contain overlaps in order to achieve better performance. Several algorithms for merging results are proposed, which take advantage of the use of duplicate documents in two ways: one correlates scores from different results; the other regards duplicates as increasing evidence of being relevant to the given query. A variety of experiments have demonstrated that these methods are effective.

[1]  Ellen M. Voorhees,et al.  Learning collection fusion strategies , 1995, SIGIR '95.

[2]  Luo Si,et al.  Using sampled data and regression to merge search engine results , 2002, SIGIR '02.

[3]  Jong-Hak Lee,et al.  Analyses of multiple evidence combination , 1997, SIGIR '97.

[4]  Jacques Savoy,et al.  Database merging strategy based on logistic regression , 2000, Inf. Process. Manag..

[5]  W. Bruce Croft,et al.  Searching distributed collections with inference networks , 1995, SIGIR '95.

[6]  Adele E. Howe,et al.  SAVVYSEARCH: A Metasearch Engine That Learns Which Search Engines to Query , 1997, AI Mag..

[7]  R. John Linear Statistical Models: An Applied Approach , 1986 .

[8]  Guijun Wang,et al.  ProFusion*: Intelligent Fusion from Multiple, Distributed Search Engines , 1996, J. Univers. Comput. Sci..

[9]  C. Lee Giles,et al.  Context and Page Analysis for Improved Web Search , 1998, IEEE Internet Comput..

[10]  Vipin Kumar,et al.  Expert agreement and content based reranking in a meta search environment using Mearf , 2002, WWW '02.

[11]  Peter Bailey,et al.  Server selection on the World Wide Web , 2000, DL '00.

[12]  Shengli Wu,et al.  Experiments with Document Archive Size Detection , 2003, ECIR.

[13]  King-Lup Liu,et al.  Efficient and effective metasearch for a large number of text databases , 1999, CIKM '99.

[14]  P. Willett,et al.  SIGIR '97 : proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Philadelphia, Pennsylvania, USA, July 27-July 31, 1997 , 1997 .

[15]  Garrison W. Cottrell,et al.  Fusion Via a Linear Combination of Scores , 1999, Information Retrieval.

[16]  Adele E. Howe,et al.  Experiences with selecting search engines using metasearch , 1997, TOIS.

[17]  King-Lup Liu,et al.  Building efficient and effective metasearch engines , 2002, CSUR.

[18]  James P. Callan,et al.  Experiments Using the Lemur Toolkit , 2001, TREC.

[19]  Oren Etzioni,et al.  The MetaCrawler architecture for resource aggregation on the Web , 1997 .

[20]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.