论文信息 - Compact inverted index storage using general‐purpose compression libraries

Compact inverted index storage using general‐purpose compression libraries

Efficient storage of large inverted indexes is one of the key technologies that support current web search services. Here we re‐examine mechanisms for representing document‐level inverted indexes and within‐document term frequencies, including comparing specialized methods developed for this task against recent fast implementations of general‐purpose adaptive compression techniques. Experiments with the Gov2‐URL collection and a large collection of crawled news stories show that standard compression libraries can provide compression effectiveness as good as or better than previous methods, with decoding rates only moderately slower than reference implementations of those tailored approaches. This surprising outcome means that high‐performance index compression can be achieved without requiring the use of specialized implementations.

Alistair Moffat | Matthias Petri

[1] Andrew Trotman,et al. Compressing Inverted Files , 2004, Information Retrieval.

[2] Hugh E. Williams,et al. Compressing Integers for Fast File Access , 1999, Comput. J..

[3] Alistair Moffat,et al. Compressed inverted files with reduced decoding overheads , 1998, SIGIR '98.

[4] Alistair Moffat,et al. Binary Interpolative Coding for Effective Index Compression , 2000, Information Retrieval.

[5] Alistair Moffat,et al. From Theory to Practice: Plug and Play with Succinct Data Structures , 2013, SEA.

[6] Aviezri S. Fraenkel,et al. Novel Compression of Sparse Bit-Strings — Preliminary Report , 1985 .

[7] Marcin Zukowski,et al. Super-Scalar RAM-CPU Cache Compression , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[8] Giuseppe Ottaviano,et al. Partitioned Elias-Fano indexes , 2014, SIGIR.

[9] Shmuel Tomi Klein,et al. Modeling word occurrences for the compression of concordances , 1997, TOIS.

[10] Leonid Boytsov,et al. Decoding billions of integers per second through vectorization , 2012, Softw. Pract. Exp..

[11] Alistair Moffat,et al. Inverted Index Compression Using Word-Aligned Binary Codes , 2004, Information Retrieval.