SIMD-based decoding of posting lists

Powerful SIMD instructions in modern processors offer an opportunity for greater search performance. In this paper, we apply these instructions to decoding search engine posting lists. We start by exploring variable-length integer encoding formats used to represent postings. We define two properties, byte-oriented and byte-preserving, that characterize many formats of interest. Based on their common structure, we define a taxonomy that classifies encodings along three dimensions, representing the way in which data bits are stored and additional bits are used to describe the data. Using this taxonomy, we discover new encoding formats, some of which are particularly amenable to SIMD-based decoding. We present generic SIMD algorithms for decoding these formats. We also extend these algorithms to the most common traditional encoding format. Our experiments demonstrate that SIMD-based decoding algorithms are up to 3 times faster than non-SIMD algorithms.

[1]  W. Bruce Croft,et al.  Search Engines - Information Retrieval in Practice , 2009 .

[2]  Sven Helmer,et al.  The implementation and performance of compressed databases , 2000, SGMD.

[3]  Ophir Frieder,et al.  Integrating Structured Data and Text: A Relational Approach , 1997, J. Am. Soc. Inf. Sci..

[4]  Wolfgang Lehner,et al.  Fast integer compression using SIMD instructions , 2010, DaMoN '10.

[5]  Vo Anh,et al.  Impact-Based Document Retrieval , 2004 .

[6]  Charles L. A. Clarke,et al.  Information Retrieval - Implementing and Evaluating Search Engines , 2010 .

[7]  H. S. Heaps Storage Analysis Of A Compression Coding For Document Data Bases , 1972 .

[8]  Jeffrey Dean,et al.  Challenges in building large-scale information retrieval systems: invited talk , 2009, WSDM '09.

[9]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[10]  Marcin Zukowski,et al.  Super-Scalar RAM-CPU Cache Compression , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[11]  Peter Elias,et al.  Universal codeword sets and representations of the integers , 1975, IEEE Trans. Inf. Theory.

[12]  Paul R. McJones,et al.  Elements of Programming , 2009 .

[13]  Alistair Moffat,et al.  Inverted Index Compression Using Word-Aligned Binary Codes , 2004, Information Retrieval.

[14]  Fabrizio Silvestri,et al.  VSEncoding: efficient coding and fast decoding of integer lists via dynamic programming , 2010, CIKM.

[15]  Jan O. Pedersen,et al.  Optimization for dynamic inverted index maintenance , 1989, SIGIR '90.