Performance evaluation of a distributed architecture for information retrieval

Information explosion across the Internet and elsewhere offersaccess toanincreasing number of document collections. In order for users to effectively access these collections, information retrieval (IR) systems must provide coordinated, concurrent, and distributed access. In this paper, we describe a fully functional distributed IR system based on the Inquery unified IR system. To refine this prototype, we implement a flexible simulation model that analyzes performance issues given a wide variety of system parameters and configurations. We present aseries ofexperiments that measure response time, system utilization, and identify bottlenecks. We vary numerous system parameters, such as the number of users, text collections, terms per query, and workload to ireneralize our results for other distributed IR systems. Based on our initial results, we recommend simple changes to the prototype and evaluate the changes using the simulator. Because of the significant resource demands of information retrieval, it is not difficult to generate workloads that overwhelm system resources regardless of the architecture. However under some realistic workloads. we demonstrate system organizations for which response’ time gracefully degrades as the workload increases and performance scales with the number of processors. This scalable architecture includes a surprisingly small number of brokers through which a large number of clients and servers communicate.

[1]  Patrick Martin,et al.  Strategies for building distributed information retrieval systems , 1987, Inf. Process. Manag..

[2]  Anthony Tomasic Distributed queries and incremental updates in information retrieval systems , 1994 .

[3]  Forbes J. Burkowski Retrieval performance of a distributed text database utilizing a parallel processor document server , 1990, DPDS '90.

[4]  Dietmar Wolfram,et al.  Applying Informetric Characteristics of Databases to IR System File Design, Part I: Informetric Models , 1992, Inf. Process. Manag..

[5]  K. McKinley,et al.  Performance Analysis of Distributed Information Retrieval Architectures , 1995 .

[6]  Hector Garcia-Molina,et al.  Performance of inverted indices in shared-nothing distributed text document information retrieval systems , 1993, [1993] Proceedings of the Second International Conference on Parallel and Distributed Information Systems.

[7]  W. Bruce Croft,et al.  Providing Government Information on the Internet: Experiences with THOMAS , 1995, DL.

[8]  W. Bruce Croft,et al.  Searching distributed collections with inference networks , 1995, SIGIR '95.

[9]  Edward A. Fox,et al.  Characterization of Two New Experimental Collections in Computer and Information Science Containing Textual and Bibliographic Concepts , 1983 .

[10]  Ralph A. Szweda,et al.  Information processing management , 1972 .

[11]  Byeong-Soo Jeong,et al.  Inverted File Partitioning Schemes in Multiple Disk Systems , 1995, IEEE Trans. Parallel Distributed Syst..

[12]  W. Bruce Croft,et al.  TREC and Tipster Experiments with Inquery , 1995, Inf. Process. Manag..

[13]  W. Bruce Croft,et al.  The INQUERY Retrieval System , 1992, DEXA.

[14]  Ellen M. Voorhees,et al.  Learning collection fusion strategies , 1995, SIGIR '95.

[15]  Donna Harman,et al.  The First Text REtrieval Conference (TREC-1) , 1993 .