Optimizing Result Prefetching in Web Search Engines with Segmented Indices

Publisher Summary The sheer size of the WWW and the efforts of search engines to index significant portions of it have caused many search engines to partition their inverted index of the Web into several disjoint segments (partial indices). The partitioning of the index impacts the manner in which the engines process queries. Most engines also use some form of query result caching, where results of queries that were served are cached for some time. In particular, query results may be prefetched in anticipation of user requests. Such scenario occurs when the engine retrieves (for a certain query) more results than will initially be returned to the user. Search engine users have been observed to browse through very few pages of results for queries that they submit. This behavior of users suggests that prefetching many results upon processing an initial query is not efficient, since the user who initiated the search will not request most of the prefetched results. However, a policy that abandons result prefetching in favor of retrieving just the first page of search results might not make optimal use of system resources as well.

[1]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[2]  Sriram Raghavan,et al.  Building a distributed full-text index for the Web , 2001, WWW '01.

[3]  Hector Garcia-Molina,et al.  Performance of inverted indices in shared-nothing distributed text document information retrieval systems , 1993, [1993] Proceedings of the Second International Conference on Parallel and Distributed Information Systems.

[4]  Byeong-Soo Jeong,et al.  Inverted File Partitioning Schemes in Multiple Disk Systems , 1995, IEEE Trans. Parallel Distributed Syst..

[5]  Kathryn S. McKinley,et al.  Evaluating the performance of distributed architectures for information retrieval using a variety of workloads , 2000, TOIS.

[6]  Artur Czumaj,et al.  Randomized allocation processes , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[7]  Eli Upfal,et al.  Balanced Allocations , 1999, SIAM J. Comput..

[8]  Amanda Spink,et al.  Real life, real users, and real needs: a study and analysis of user queries on the web , 2000, Inf. Process. Manag..

[9]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[10]  Giles,et al.  Searching the world wide Web , 1998, Science.

[11]  Berthier A. Ribeiro-Neto,et al.  Parallel generation of inverted files for distributed text collections , 1998, Proceedings SCCC'98. 18th International Conference of the Chilean Society of Computer Science (Cat. No.98EX212).

[12]  Craig Silverstein,et al.  Analysis of a Very Large Altavista Query Log" SRC Technical note #1998-14 , 1998 .

[13]  Divesh Srivastava,et al.  Interaction of query evaluation and buffer management for information retrieval , 1998, SIGMOD '98.

[14]  Berthier A. Ribeiro-Neto,et al.  Query performance for tightly coupled distributed digital libraries , 1998, DL '98.

[15]  N. L. Johnson,et al.  Some applications of two approximations to the multinomial distribution , 1960 .

[16]  Sriram Raghavan,et al.  Searching the Web , 2001, ACM Trans. Internet Techn..

[17]  Evangelos P. Markatos,et al.  On caching search engine query results , 2001, Comput. Commun..

[18]  Norman L. Johnson,et al.  Urn models and their application , 1977 .

[19]  David Hawking Scalable Text Retrieval for Large Digital Libraries , 1997, ECDL.