Learning collection fusion strategies

Collection fusion is a data fusion problem in which the results of retrieval runs on separate, autonomous document collections must be merged to produce a single, effective result. This paper explores two collection fusion techniques that learn the rmrnber of documents to retrieve from each collection using only the ranked lists of documents returned in response to past queries and those documents! relevance judgments. Retrieval experiments using the TREC test co]lection demonstrate that the effectiveness of the fusion techniques is within 10’?%of the effectiveness of a run in which the entire set of documents is treated as a single collection.

[1]  Peter Willett,et al.  Identification of duplicate and near‐duplicate full‐text records in database search‐outputs using hierarchic cluster analysis , 1995 .

[2]  Donna K. Harman The First Text REtrieval Conference (TREC-1), Rockville, MD, USA, 4-6 November 1992 , 1993, Inf. Process. Manag..

[3]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[4]  Nicholas J. Belkin,et al.  The effect multiple query representations on information retrieval system performance , 1993, SIGIR.

[5]  Susan T. Dumais,et al.  LSI meets TREC: A Status Report , 1992, TREC.

[6]  Richard S. Marcus,et al.  A translating computer interface for end-user operation of heterogeneous retrieval systems. II. Evaluations , 1981, J. Am. Soc. Inf. Sci..

[7]  Ellen M. Voorhees,et al.  The Collection Fusion Problem , 1994, TREC.

[8]  Donna Harman,et al.  The First Text REtrieval Conference (TREC-1) , 1993 .

[9]  Chris Buckley,et al.  Implementation of the SMART Information Retrieval System , 1985 .

[10]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[11]  James Allan,et al.  Automatic Retrieval With Locality Information Using SMART , 1992, TREC.

[12]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[13]  Alistair Moffat,et al.  Information Retrieval Systems for Large Document Collections , 1994, TREC.

[14]  Garrison W. Cottrell,et al.  Automatic combination of multiple ranked retrieval systems , 1994, SIGIR '94.