Performance Comparison of Clustered and Replicated Information Retrieval Systems

The amount of information available over the Internet is increasing daily as well as the importance and magnitude of Web search engines. Systems based on a single centralised index present several problems (such as lack of scalability), which lead to the use of distributed information retrieval systems to effectively search for and locate the required information. A distributed retrieval system can be clustered and/or replicated. In this paper, using simulations, we present a detailed performance analysis, both in terms of throughput and response time, of a clustered system compared to a replicated system. In addition, we consider the effect of changes in the query topics over time. We show that the performance obtained for a clustered system does not improve the performance obtained by the best replicated system. Indeed, the main advantage of a clustered system is the reduction of network traffic. However, the use of a switched network eliminates the bottleneck in the network, markedly improving the performance of the replicated systems. Moreover, we illustrate the negative performance effect of the changes over time in the query topics when a distributed clustered system is used. On the contrary, the performance of a distributed replicated system is query independent.

[1]  Berthier A. Ribeiro-Neto,et al.  Query performance for tightly coupled distributed digital libraries , 1998, DL '98.

[2]  Iadh Ounis,et al.  Performance Analysis of Distributed Architectures to Index One Terabyte of Text , 2004, ECIR.

[3]  Iadh Ounis,et al.  Network Analysis for Distributed Information Retrieval Architectures , 2005, ECIR.

[4]  David Hawking,et al.  Methods for information server selection , 1999, TOIS.

[5]  Robert Weibel,et al.  Spatial information retrieval and geographical ontologies an overview of the SPIRIT project , 2002, SIGIR '02.

[6]  Amanda Spink,et al.  From E-Sex to E-Commerce: Web Search Changes , 2002, Computer.

[7]  Hector Garcia-Molina,et al.  Performance of inverted indices in shared-nothing distributed text document information retrieval systems , 1993, [1993] Proceedings of the Second International Conference on Parallel and Distributed Information Systems.

[8]  M. Walzer What Does It Mean to Be , 2004 .

[9]  Kathryn S. McKinley,et al.  Performance evaluation of a distributed architecture for information retrieval , 1996, SIGIR '96.

[10]  David Hawking Scalable Text Retrieval for Large Digital Libraries , 1997, ECDL.

[11]  Hava T. Siegelmann,et al.  On the allocation of documents in multiprocessor information retrieval systems , 1991, SIGIR '91.

[12]  Byeong-Soo Jeong,et al.  Inverted File Partitioning Schemes in Multiple Disk Systems , 1995, IEEE Trans. Parallel Distributed Syst..

[13]  Z. Lin,et al.  Parallelizing I/O intensive applications for a workstation cluster: a case study , 1993, CARN.

[14]  Ophir Frieder,et al.  Hourly analysis of a very large topically categorized web query log , 2004, SIGIR '04.

[15]  Alistair Moffat,et al.  Load balancing for term-distributed parallel retrieval , 2006, SIGIR.

[16]  Stanley Y. W. Su,et al.  Web Information Systems – WISE 2004 , 2004, Lecture Notes in Computer Science.

[17]  Iadh Ounis,et al.  Performance analysis of distributed information retrieval architectures using an improved network simulation model , 2007, Inf. Process. Manag..

[18]  Gerard Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[19]  Alistair Moffat,et al.  What Does It Mean to "Measure Performance"? , 2004, WISE.

[20]  Alistair Moffat,et al.  A pipelined architecture for distributed text query evaluation , 2007, Information Retrieval.

[21]  Ben He,et al.  Terrier : A High Performance and Scalable Information Retrieval Platform , 2022 .

[22]  Ángel Viña,et al.  Experiences retrieving information in the world wide web , 2001, Proceedings. Sixth IEEE Symposium on Computers and Communications.

[23]  Kathryn S. McKinley,et al.  Partial collection replication versus caching for information retrieval systems , 2000, SIGIR '00.

[24]  Iadh Ounis,et al.  A case study of distributed information retrieval architectures to index one terabyte of text , 2005, Inf. Process. Manag..

[25]  Luiz André Barroso,et al.  Web Search for a Planet: The Google Cluster Architecture , 2003, IEEE Micro.