Performance Evaluation of Large-scale Information Retrieval Systems Scaling Down

The performance evaluation of an IR system is a key point in the development of any search engine, and specially in the Web. In order to get the performance we are used to, Web search engines are based on large-scale distributed systems and to optimise its performance is an important aspect in the literature. The main methods, that can be found in the literature, to analyse the performance of a distributed IR system are: the use of an analytical model, a simulation model and a real search engine. When using an analytical or simulation model some details could be missing and this will produce some differences between the real and estimated performance. When using a real system, the results obtained will be more precise but the resources required to build a large-scale search engine are excessive. In this paper we propose to study the performance by building a scaled-down version of a search engine using virtualization tools to create a realistic distributed system. Scaling-down a distributed IR system will maintain the behaviour of the whole system and, at the same time, the computer requirements will be softened. This allows the use of virtualization tools to build a large-scale distributed system using just a small cluster of computers.

[1]  Ricardo A. Baeza-Yates,et al.  Challenges on Distributed Web Retrieval , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[2]  T. Chiueh,et al.  A Survey on Virtualization Technologies , 2005 .

[3]  Charles L. A. Clarke,et al.  Overview of the TREC 2004 Terabyte Track , 2004, TREC.

[4]  Berthier A. Ribeiro-Neto,et al.  Query performance for tightly coupled distributed digital libraries , 1998, DL '98.

[5]  Abdur Chowdhury,et al.  Operational requirements for scalable search systems , 2003, CIKM '03.

[6]  Sang Joon Lee,et al.  Improvement of natural ventilation in a large factory building using a louver ventilator , 2008 .

[7]  Mark Jabbal,et al.  Towards the Design of Synthetic-jet Actuators for Full-scale Flight Conditions , 2007 .

[8]  Iadh Ounis,et al.  Performance analysis of distributed information retrieval architectures using an improved network simulation model , 2007, Inf. Process. Manag..

[9]  Ricardo A. Baeza-Yates,et al.  Analyzing imbalance among homogeneous index servers in a web search system , 2007, Inf. Process. Manag..

[10]  Jeng-Horng Chen,et al.  A moving PIV system for ship model test in a towing tank , 2006 .

[11]  Hector Garcia-Molina,et al.  Query processing and inverted indices in shared-nothing text document information retrieval systems , 1993, The VLDB Journal.

[12]  Alistair Moffat,et al.  A pipelined architecture for distributed text query evaluation , 2007, Information Retrieval.

[13]  Forbes J. Burkowski Retrieval performance of a distributed text database utilizing a parallel processor document server , 1990, DPDS '90.