Posting list intersection on multicore architectures

In current commercial Web search engines, queries are processed in the conjunctive mode, which requires the search engine to compute the intersection of a number of posting lists to determine the documents matching all query terms. In practice, the intersection operation takes a significant fraction of the query processing time, for some queries dominating the total query latency. Hence, efficient posting list intersection is critical for achieving short query latencies. In this work, we focus on improving the performance of posting list intersection by leveraging the compute capabilities of recent multicore systems. To this end, we consider various coarse-grained and fine-grained parallelization models for list intersection. Specifically, we present an algorithm that partitions the work associated with a given query into a number of small and independent tasks that are subsequently processed in parallel. Through a detailed empirical analysis of these alternative models, we demonstrate that exploiting parallelism at the finest-level of granularity is critical to achieve the best performance on multicore systems. On an eight-core system, the fine-grained parallelization method is able to achieve more than five times reduction in average query processing time while still exploiting the parallelism for high query throughput.

[1]  Berthier A. Ribeiro-Neto,et al.  Query performance for tightly coupled distributed digital libraries , 1998, DL '98.

[2]  N. Ziviani,et al.  Distributed query processing using partitioned inverted files , 2001, Proceedings Eighth Symposium on String Processing and Information Retrieval.

[3]  Luiz André Barroso,et al.  Web Search for a Planet: The Google Cluster Architecture , 2003, IEEE Micro.

[4]  Erik D. Demaine,et al.  Experiments on Adaptive Set Intersections for Text Retrieval Systems , 2001, ALENEX.

[5]  Aristides Gionis,et al.  The impact of caching on search engines , 2007, SIGIR.

[6]  Ricardo A. Baeza-Yates,et al.  A Fast Set Intersection Algorithm for Sorted Sequences , 2004, CPM.

[7]  W. Bruce Croft,et al.  Optimization strategies for complex queries , 2005, SIGIR '05.

[8]  Torsten Suel,et al.  Performance of compressed inverted list caching in search engines , 2008, WWW.

[9]  W. Bruce Croft,et al.  Efficient document retrieval in main memory , 2007, SIGIR.

[10]  Sudipto Guha,et al.  Improving the Performance of List Intersection , 2009, Proc. VLDB Endow..

[11]  Eitan Frachtenberg,et al.  Reducing Query Latencies in Web Search Using Fine-Grained Parallelism , 2009, World Wide Web.

[12]  Ron Sacks-Davis,et al.  Filtered document retrieval with frequency-sorted indexes , 1996 .

[13]  Alejandro López-Ortiz,et al.  Faster Adaptive Set Intersections for Text Searching , 2006, WEA.

[14]  Chris Buckley,et al.  Optimization of inverted vector searches , 1985, SIGIR '85.

[15]  Ricardo Baeza-Yates,et al.  ResIn: a combination of results caching and index pruning for high-performance web search engines , 2008, SIGIR '08.

[16]  Nicholas J. Belkin,et al.  Query length in interactive information retrieval , 2003, SIGIR.

[17]  Alistair Moffat,et al.  Self-indexing inverted files for fast text retrieval , 1996, TOIS.

[18]  Erik D. Demaine,et al.  Adaptive set intersections, unions, and differences , 2000, SODA '00.

[19]  Alistair Moffat,et al.  Compressed inverted files with reduced decoding overheads , 1998, SIGIR '98.

[20]  Torsten Suel,et al.  Using graphics processors for high-performance IR query processing , 2008, WWW.

[21]  Eric A. Brewer,et al.  Lessons from Giant-Scale Services , 2001, IEEE Internet Comput..

[22]  Alistair Moffat,et al.  A pipelined architecture for distributed text query evaluation , 2007, Information Retrieval.

[23]  Marcin Zukowski,et al.  Super-Scalar RAM-CPU Cache Compression , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[24]  Jérémy Barbay Optimality of Randomized Algorithms for the Intersection Problem , 2003, SAGA.

[25]  Shirish Tatikonda,et al.  Mining Tree-Structured Data on Multicore Systems , 2009, Proc. VLDB Endow..

[26]  R. B. González,et al.  Index compression for information retrieval systems , 2008 .

[27]  Berkant Barla Cambazoglu,et al.  A refreshing perspective of search engine caching , 2010, WWW '10.

[28]  Torsten Suel,et al.  Improved techniques for result caching in web search engines , 2009, WWW '09.

[29]  Francisco Tirado,et al.  Improving Search Engines Performance on Multithreading Processors , 2008, VECPAR.

[30]  Alistair Moffat,et al.  Vector-space ranking with effective early termination , 2001, SIGIR '01.

[31]  Berkant Barla Cambazoglu,et al.  Early exit optimizations for additive machine learned ranking systems , 2010, WSDM '10.

[32]  Fabrizio Silvestri,et al.  Query-driven document partitioning and collection selection , 2006, InfoScale '06.