Inverted Index Compression Using Word-Aligned Binary Codes

We examine index representation techniques for document-based inverted files, and present a mechanism for compressing them using word-aligned binary codes. The new approach allows extremely fast decoding of inverted lists during query processing, while providing compression rates better than other high-throughput representations. Results are given for several large text collections in support of these claims, both for compression effectiveness and query efficiency.

[1]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[2]  L. R. Rasmussen,et al.  In information retrieval: data structures and algorithms , 1992 .

[3]  Donna K. Harman,et al.  Overview of the Second Text REtrieval Conference (TREC-2) , 1994, HLT.

[4]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[5]  Alistair Moffat,et al.  Adding compression to a full‐text retrieval system , 1995, Softw. Pract. Exp..

[6]  Ron Sacks-Davis,et al.  Filtered document retrieval with frequency-sorted indexes , 1996 .

[7]  Hugh E. Williams,et al.  Compressing Integers for Fast File Access , 1999, Comput. J..

[8]  Ian H. Witten,et al.  Managing gigabytes (2nd ed.): compressing and indexing documents and images , 1999 .

[9]  Ricardo A. Baeza-Yates,et al.  Fast and flexible word searching on compressed text , 2000, TOIS.

[10]  Alistair Moffat,et al.  Vector-space ranking with effective early termination , 2001, SIGIR '01.

[11]  Hugh E. Williams,et al.  Compression of inverted indexes For fast query evaluation , 2002, SIGIR '02.

[12]  David Hawking,et al.  Overview of the TREC-2002 Web Track , 2002, TREC.

[13]  Ian Soboroff,et al.  Does WT10g look like the web? , 2002, SIGIR '02.

[14]  Guy E. Blelloch,et al.  Index compression through document reordering , 2002, Proceedings DCC 2002. Data Compression Conference.

[15]  Peter Bailey,et al.  Engineering a multi-purpose test collection for Web retrieval experiments , 2003, Inf. Process. Manag..

[16]  Andrew Trotman,et al.  Compressing Inverted Files , 2004, Information Retrieval.

[17]  Alistair Moffat,et al.  Binary Interpolative Coding for Effective Index Compression , 2000, Information Retrieval.

[18]  Alistair Moffat,et al.  Index Compression Using Fixed Binary Codewords , 2004, ADC.