Fast Block-Compressed Inverted Lists

New techniques for compressing and storing inverted lists are presented. Differently from previous research, these techniques are especially designed for volatile inverted lists and combine different types of compression (including prefix compression) with block segmentation to allow easy insertion/deletion of pointers and, most importantly, to significantly reduce execution times while keeping storage requirements close to a baseline monolithic inverted list implementation based on Elias’s ( codes. Inverted lists for information retrieval are addressed and experiments are reported. The best method uses an optimized block-oriented evaluation that is able to efficiently skip irrelevant pointers and that has an observed average execution time which is less than 65% of the baseline implementation.

[1]  S. Golomb Run-length encodings. , 1966 .

[2]  Ricardo A. Baeza-Yates,et al.  Adding Compression to Block Addressing Inverted Indexes , 2000, Information Retrieval.

[3]  Solomon W. Golomb,et al.  Run-length encodings (Corresp.) , 1966, IEEE Trans. Inf. Theory.

[4]  Ricardo A. Baeza-Yates,et al.  A Fast Set Intersection Algorithm for Sorted Sequences , 2004, CPM.

[5]  Hugh E. Williams,et al.  Compression of inverted indexes For fast query evaluation , 2002, SIGIR '02.

[6]  Peter Elias,et al.  Universal codeword sets and representations of the integers , 1975, IEEE Trans. Inf. Theory.

[7]  Giovanni Maria Sacco,et al.  Dynamic Taxonomies and Faceted Search: Theory, Practice, and Experience , 2009, The Information Retrieval Series.

[8]  Robert E. Wagner,et al.  Indexing Design Considerations , 1973, IBM Syst. J..

[9]  Giovanni Maria Sacco,et al.  Dynamic Taxonomies: A Model for Large Information Bases , 2000, IEEE Trans. Knowl. Data Eng..

[10]  J. Shane Culpepper,et al.  Efficient set intersection for inverted indexing , 2010, TOIS.

[11]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[12]  Torsten Suel,et al.  Inverted index compression and query processing with optimized document ordering , 2009, WWW '09.

[13]  Rudolf Bayer,et al.  Prefix B-trees , 1977, TODS.

[14]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[15]  JUSTIN ZOBEL,et al.  Inverted files for text search engines , 2006, CSUR.