Efficient Parallel Block-Max WAND Algorithm

Large Web search engines are complex systems that solve thousands of user queries per second on clusters of dedicated distributed memory processors. Processing each query involves executing a number of operations to get the answer presented to the user. The most expensive operation in running time is the calculation of the top-k documents that best match each query. In this paper we propose the parallelization of a state of the art document ranking algorithm called Block-Max WAND. We propose a 2-steps parallelization of the WAND algorithm in order to reduce inter-processor communication and running time cost. Multi-threading tailored to Block-Max WAND is also proposed to exploit multi-core parallelism in each processor. The experimental results show that the proposed parallelization reduces execution time significantly as compared against current approaches used in search engines.

[1]  Torsten Suel,et al.  Inverted index compression and query processing with optimized document ordering , 2009, WWW '09.

[2]  Alistair Moffat,et al.  Pruned query evaluation using pre-computed impacts , 2006, SIGIR.

[3]  Torsten Suel,et al.  Faster top-k document retrieval using block-max indexes , 2011, SIGIR.

[4]  Ricardo Baeza-Yates,et al.  Sync/Async parallel search for the efficient design and construction of web search engines , 2010, Parallel Comput..

[5]  Surajit Chaudhuri,et al.  Interval-based pruning for top-k processing over compressed lists , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[6]  Andrei Z. Broder,et al.  Efficient query evaluation using a two-level retrieval process , 2003, CIKM '03.

[7]  Shirish Tatikonda,et al.  Posting list intersection on multicore architectures , 2011, SIGIR.

[8]  Roi Blanco,et al.  Probabilistic static pruning of inverted files , 2010, TOIS.

[9]  Gerhard Weikum,et al.  IO-Top-k: index-access optimized top-k query processing , 2006, VLDB.

[10]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[11]  Svein Erik Bratsberg,et al.  Intra-query Concurrent Pipelined Processing for Distributed Full-Text Retrieval , 2012, ECIR.

[12]  W. Bruce Croft,et al.  Optimization strategies for complex queries , 2005, SIGIR '05.

[13]  Torsten Suel,et al.  Optimized Query Execution in Large Search Engines with Global Page Ordering , 2003, VLDB.