Efficient Grammar Generation for Inverted Indexes

Inverted indexes are commonly utilized in large-scale search engines to store lists of document identifies (docIDs) relevant to query terms, which are queried maybe thousands of times per second. Traditionally, optimized integer sequence encoding methods are applied to compress the inverted index while simultaneously maintaining reasonable query processing speeds. Recently, a context-free grammar-based method was introduced for inverted index compression, which is particularly useful for highly repetitive indexes.

[1]  Alistair Moffat,et al.  Off-line dictionary-based compression , 1999, Proceedings of the IEEE.

[2]  Alistair Moffat,et al.  Binary Interpolative Coding for Effective Index Compression , 2000, Information Retrieval.

[3]  Andrei Z. Broder,et al.  Efficient query evaluation using a two-level retrieval process , 2003, CIKM '03.

[4]  Dake He,et al.  Efficient universal lossless data compression algorithms based on a greedy sequential grammar transform .2. With context models , 2000, IEEE Trans. Inf. Theory.

[5]  Liang Shi,et al.  Yet Another Sorting-Based Solution to the Reassignment of Document Identifiers , 2012, AIRS.

[6]  Torsten Suel,et al.  Performance of compressed inverted list caching in search engines , 2008, WWW.

[7]  Gang Wang,et al.  Leveraging Context-Free Grammar for Efficient Inverted Index Compression , 2016, SIGIR.

[8]  Craig G. Nevill-Manning,et al.  Compression and Explanation Using Hierarchical Grammars , 1997, Comput. J..

[9]  Diego Arroyuelo,et al.  Document identifier reassignment and run-length-compressed inverted indexes for improved search performance , 2013, SIGIR.

[10]  Torsten Suel,et al.  Inverted index compression and query processing with optimized document ordering , 2009, WWW '09.

[11]  Alistair Moffat,et al.  Off-line dictionary-based compression , 2000 .

[12]  Marcin Zukowski,et al.  Super-Scalar RAM-CPU Cache Compression , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[13]  Miguel A. Martínez-Prieto,et al.  Indexes for highly repetitive document collections , 2011, CIKM '11.