Intra-query Concurrent Pipelined Processing for Distributed Full-Text Retrieval

Pipelined query processing over a term-wise distributed inverted index has superior throughput at high query multiprogramming levels. However, due to long query latencies this approach is inefficient at lower levels. In this paper we explore two types of intra-query parallelism within the pipelined approach, parallel execution of a query on different nodes and concurrent execution on the same node. According to the experimental results, our approach reaches the throughput of the state-of-the-art method at about half of the latency. On the single query case the observed latency improvement is up to 2.6 times.

[1]  Svein Erik Bratsberg,et al.  A Combined Semi-pipelined Query Processing Architecture for Distributed Full-Text Retrieval , 2010, WISE.

[2]  Fabrizio Silvestri,et al.  Mining query logs to optimize index partitioning in parallel web search engines , 2007, Infoscale.

[3]  Ricardo A. Baeza-Yates,et al.  Distributed Query Processing Using Partitioned Inverted Files , 2001, SPIRE.

[4]  Stuart Macdonald,et al.  User Engagement in Research Data Curation , 2009, ECDL.

[5]  Charles L. A. Clarke,et al.  Information Retrieval - Implementing and Evaluating Search Engines , 2010 .

[6]  Alistair Moffat,et al.  A pipelined architecture for distributed text query evaluation , 2007, Information Retrieval.

[7]  Alistair Moffat,et al.  Load balancing for term-distributed parallel retrieval , 2006, SIGIR.

[8]  Wojciech Rytter,et al.  Extracting Powers and Periods in a String from Its Runs Structure , 2010, SPIRE.

[9]  Torsten Suel,et al.  Faster top-k document retrieval using block-max indexes , 2011, SIGIR.

[10]  Hector Garcia-Molina,et al.  Query processing and inverted indices in shared-nothing text document information retrieval systems , 1993, The VLDB Journal.

[11]  Edward A. Fox,et al.  Hybrid Partition Inverted Files: Experimental Validation , 2002, ECDL.

[12]  William Webber,et al.  Design and Evaluation of a Pipelined Distributed Information Retrieval Architecture , 2007 .

[13]  Svein Erik Bratsberg,et al.  Efficient Compressed Inverted Index Skipping for Disjunctive Text-Queries , 2011, ECIR.

[14]  Anne H. H Ngu,et al.  Web Information Systems Engineering - WISE 2005, 6th International Conference on Web Information Systems Engineering, New York, NY, USA, November 20-22, 2005, Proceedings , 2005, WISE.

[15]  Torsten Suel,et al.  Inverted index compression and query processing with optimized document ordering , 2009, WWW '09.

[16]  W. Bruce Croft,et al.  Efficient document retrieval in main memory , 2007, SIGIR.

[17]  Torsten Suel,et al.  Optimized Inverted List Assignment in Distributed Search Engine Architectures , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[18]  Howard R. Turtle,et al.  Query Evaluation: Strategies and Optimizations , 1995, Inf. Process. Manag..

[19]  Torsten Suel,et al.  Web Information Systems Engineering - WISE 2010 - 11th International Conference, Hong Kong, China, December 12-14, 2010. Proceedings , 2010, WISE.

[20]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[21]  Berthier A. Ribeiro-Neto,et al.  Query performance for tightly coupled distributed digital libraries , 1998, DL '98.

[22]  Ricardo Baeza-Yates,et al.  Sync/Async parallel search for the efficient design and construction of web search engines , 2010, Parallel Comput..

[23]  Mauricio Marín,et al.  High-performance distributed inverted files , 2007, CIKM '07.

[24]  Michel J. Mizrahi,et al.  Two-Dimensional Distributed Inverted Files , 2009, SPIRE.

[25]  Alistair Moffat,et al.  Space-Limited Ranked Query Evaluation Using Adaptive Pruning , 2005, WISE.