Methodologies for distributed information retrieval

Text collections have traditionally been located at a single site and managed as a monolithic whole. However, it is now common for a collection to be spread over several hosts and for these hosts to be geographically separated. The authors examine several alternative approaches to distributed text retrieval. They report on their experience with a full implementation of these methods, and give retrieval efficiency and retrieval effectiveness results for collections distributed over both a local area network and a wide area network. They conclude that, compared to monolithic systems, distributed information retrieval systems can be fast and effective, but that they are not efficient.

[1]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[2]  W. Bruce Croft,et al.  Searching distributed collections with inference networks , 1995, SIGIR '95.

[3]  James C. French,et al.  Dissemination of collection wide information in a distributed information retrieval system , 1995, SIGIR '95.

[4]  Ellen M. Voorhees,et al.  Siemens TREC-4 Report: Further Experiments with Database Merging , 1995, TREC.

[5]  Donna K. Harman,et al.  Prototyping a distributed information retrieval system that uses statistical ranking , 1991, Inf. Process. Manag..

[6]  Alistair Moffat,et al.  Information Retrieval Systems for Large Document Collections , 1994, TREC.

[7]  Kathryn S. McKinley,et al.  Performance evaluation of a distributed architecture for information retrieval , 1996, SIGIR '96.

[8]  Ron Sacks-Davis,et al.  Filtered document retrieval with frequency-sorted indexes , 1996 .

[9]  L. R. Rasmussen,et al.  In information retrieval: data structures and algorithms , 1992 .

[10]  Ellen M. Voorhees,et al.  Learning collection fusion strategies , 1995, SIGIR '95.

[11]  Alistair Moffat,et al.  Compression and Fast Indexing for Multi-Gigabyte Text Databases , 1994, Aust. Comput. J..

[12]  Justin Zobel,et al.  Collection Selection via Lexicon Inspection , 1997 .

[13]  Dik Lun Lee,et al.  Server Ranking for Distributed Text Retrieval Systems on the Internet , 1997, DASFAA.

[14]  Ian H. Witten,et al.  The MG retrieval system: compressing for space and speed , 1995, CACM.

[15]  Alistair Moffat,et al.  Self-indexing inverted files for fast text retrieval , 1996, TOIS.

[16]  Ellen M. Voorhees,et al.  Multiple search engines in database merging , 1997, DL '97.

[17]  Alistair Moffat,et al.  Exploring the similarity space , 1998, SIGF.

[18]  Luis Gravano,et al.  Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies , 1995, VLDB.

[19]  Alan F. Smeaton,et al.  TREC-4 Experiments at Dublin City University: Thresholding Posting Lists, Query Expansion with WordNet and POS Tagging of Spanish , 1995, TREC.

[20]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[21]  Patrick Martin,et al.  Strategies for building distributed information retrieval systems , 1987, Inf. Process. Manag..

[22]  Norbert Fuhr,et al.  Routing and Ad-hoc Retrieval with the TREC-3 Collection in a Distributed Loosely Federated Environment , 1994, TREC.

[23]  Donna K. Harman,et al.  Ranking Algorithms , 1992, Information Retrieval: Data Structures & Algorithms.

[24]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[25]  Hector Garcia-Molina,et al.  Performance of inverted indices in shared-nothing distributed text document information retrieval systems , 1993, [1993] Proceedings of the Second International Conference on Parallel and Distributed Information Systems.

[26]  Donna K. Harman,et al.  Overview of the Second Text REtrieval Conference (TREC-2) , 1994, HLT.

[27]  Ellen M. Voorhees,et al.  The Collection Fusion Problem , 1994, TREC.