Compression, SIMD, and Postings Lists

The three generations of postings list compression strategies (Variable Byte Encoding, Word Aligned Codes, and SIMD Codecs) are examined in order to test whether or not each truly represented a generational change -- they do. Some weaknesses of the current SIMD-based schemes are identified and a new scheme, QMX, is introduced to address both space and decoding inefficiencies. Improvements are examined on multiple architectures and it is shown that different SSE implementations (Intel and AMD) perform differently.

[1]  S. Golomb Run-length encodings. , 1966 .

[2]  Hugh E. Williams,et al.  Compressing Integers for Fast File Access , 1999, Comput. J..

[3]  Leonid Boytsov,et al.  Decoding billions of integers per second through vectorization , 2012, Softw. Pract. Exp..

[4]  Solomon W. Golomb,et al.  Run-length encodings (Corresp.) , 1966, IEEE Trans. Inf. Theory.

[5]  Andrew Trotman,et al.  Towards an Efficient and Effective Search Engine , 2012, OSIR@SIGIR.

[6]  Craig MacDonald,et al.  On Inverted Index Compression for Search Engine Efficiency , 2014, ECIR.

[7]  Peter Elias,et al.  Universal codeword sets and representations of the integers , 1975, IEEE Trans. Inf. Theory.

[8]  Alistair Moffat,et al.  Inverted Index Compression Using Word-Aligned Binary Codes , 2004, Information Retrieval.

[9]  Alexander A. Stepanov,et al.  SIMD-based decoding of posting lists , 2011, CIKM '11.

[10]  Jeffrey Dean,et al.  Challenges in building large-scale information retrieval systems: invited talk , 2009, WSDM '09.

[11]  Marcin Zukowski,et al.  Super-Scalar RAM-CPU Cache Compression , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[12]  Alistair Moffat,et al.  Index compression using 64‐bit words , 2010, Softw. Pract. Exp..

[13]  Andrew Trotman,et al.  Compressing Inverted Files , 2004, Information Retrieval.

[14]  MoffatAlistair,et al.  Inverted Index Compression Using Word-Aligned Binary Codes , 2005 .

[15]  Hugh E. Williams,et al.  Compression of inverted indexes For fast query evaluation , 2002, SIGIR '02.

[16]  Alistair Moffat,et al.  Binary Interpolative Coding for Effective Index Compression , 2000, Information Retrieval.

[17]  Torsten Suel,et al.  Performance of compressed inverted list caching in search engines , 2008, WWW.

[18]  Fabrizio Silvestri,et al.  VSEncoding: efficient coding and fast decoding of integer lists via dynamic programming , 2010, CIKM.