Distributed Multisearch and Resource Selection for the TREC Million Query Track

Abstract : A distributed information retrieval system with resource-selection and result-set merging capability was used to search subsets of the GOV2 document corpus for the 2008 TREC Million Query Track. The GOV2 collection was partitioned into host-name subcollections and distributed to multiple remote machines. The Multisearch demonstrations, application restricted each search to a fraction of the available sum-collections that was pre-determined by a resource-selection algorithm. Experiment results from topic-by-topic resource selection and aggregate topic resource selection are compared. The sensitivity of Multisearch retrieval performance to variations in the resource selection algorithm is discussed.

[1]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[2]  David Hawking,et al.  Evaluating sampling methods for uncooperative collections , 2007, SIGIR.

[3]  Gregory B. Newby,et al.  Collection Selection Based on Historical Performance for Efficient Processing , 2007, TREC.

[4]  P. C. Wong,et al.  Generalized vector spaces model in information retrieval , 1985, SIGIR '85.

[5]  Ben Carterette,et al.  Million Query Track 2007 Overview , 2008, TREC.

[6]  Jacques Savoy,et al.  Database merging strategy based on logistic regression , 2000, Inf. Process. Manag..

[7]  Charles L. A. Clarke,et al.  The TREC 2006 Terabyte Track , 2006, TREC.

[8]  Gregory B. Newby,et al.  Logistic Regression Merging of Amberfish and Lucene Multisearch Results , 2005, TREC.

[9]  Miles Efron,et al.  Eigenvalue-based model selection during latent semantic indexing , 2005, J. Assoc. Inf. Sci. Technol..

[10]  Ellen M. Voorhees,et al.  Overview of the TREC 2006 , 2007, TREC.

[11]  Gregory B. Newby,et al.  Partitioning the Gov2 Corpus by Internet Domain Name: A Result-set Merging Experiment , 2006, TREC.

[12]  Luo Si Federated search of text search engines in uncooperative environments , 2007, SIGF.

[13]  King-Lup Liu,et al.  Building efficient and effective metasearch engines , 2002, CSUR.

[14]  Ellen M. Voorhees,et al.  Overview of TREC 2007 , 2007, TREC.

[15]  Nassib Nassar,et al.  Amberfish at the TREC 2004 Terabyte Track , 2004, Text Retrieval Conference.