Basic issues on the processing of web queries

In this paper we study three basic and key issues related to Web query processing: load balance, broker behavior, and performance by individual index servers. Our study, while preliminary, does reveal interesting tradeoffs: (1) load unbalance at low query arrival rates can be controlled with a simple measure of randomizing the distribution of documents among the index servers, (2) the broker is not a bottleneck, and (3) disk utilization is higher than CPU utilization.

[1]  Byeong-Soo Jeong,et al.  Inverted File Partitioning Schemes in Multiple Disk Systems , 1995, IEEE Trans. Parallel Distributed Syst..

[2]  Knut Magne Risvik,et al.  Multi-tier architecture for Web search engines , 2003, Proceedings of the IEEE/LEOS 3rd International Conference on Numerical Simulation of Semiconductor Optoelectronic Devices (IEEE Cat. No.03EX726).

[3]  Luiz André Barroso,et al.  Web Search for a Planet: The Google Cluster Architecture , 2003, IEEE Micro.

[4]  Andrei Z. Broder,et al.  Efficient query evaluation using a two-level retrieval process , 2003, CIKM '03.

[5]  Hector Garcia-Molina,et al.  Performance of inverted indices in shared-nothing distributed text document information retrieval systems , 1993, [1993] Proceedings of the Second International Conference on Parallel and Distributed Information Systems.

[6]  Howard R. Turtle,et al.  Query Evaluation: Strategies and Optimizations , 1995, Inf. Process. Manag..

[7]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[8]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[9]  Berthier A. Ribeiro-Neto,et al.  Query performance for tightly coupled distributed digital libraries , 1998, DL '98.

[10]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[11]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[12]  Stephen E. Robertson,et al.  Parallel search using partitioned inverted files , 2000, Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000.

[13]  David L. Waltz,et al.  A parallel indexed algorithm for information retrieval , 1989, SIGIR '89.

[14]  Ricardo A. Baeza-Yates,et al.  Distributed Query Processing Using Partitioned Inverted Files , 2001, SPIRE.

[15]  Sriram Raghavan,et al.  Searching the Web , 2001, ACM Trans. Internet Techn..

[16]  Justin Zobel,et al.  Filtered Document Retrieval with Frequency-Sorted Indexes , 1996, J. Am. Soc. Inf. Sci..

[17]  Alistair Moffat,et al.  Fast ranking in limited space , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[18]  Justin Zobel,et al.  Term-ordered query evaluation versus document-ordered query evaluation for large document databases , 1998, SIGIR '98.

[19]  Alistair Moffat,et al.  Compressed inverted files with reduced decoding overheads , 1998, SIGIR '98.

[20]  David Mosberger,et al.  httperf—a tool for measuring web server performance , 1998, PERV.

[21]  Kathryn S. McKinley,et al.  Evaluating the performance of distributed architectures for information retrieval using a variety of workloads , 2000, TOIS.

[22]  Patrick Martin,et al.  Strategies for building distributed information retrieval systems , 1987, Inf. Process. Manag..