Compressed Inverted Indexes for In-Memory Search Engines

We present the algorithmic core of a full text data base that allows fast Boolean queries, phrase queries, and document reporting using less space than the input text. The system uses a carefully choreographed combination of classical data compression techniques and inverted index based search data structures. It outperforms suffix array based techniques for all the above operations for real world (natural language) texts.

[1]  Jan O. Pedersen,et al.  Optimization for dynamic inverted index maintenance , 1989, SIGIR '90.

[2]  Amanda Spink,et al.  Real life information retrieval: a study of user queries on the Web , 1998, SIGF.

[3]  Hugh E. Williams,et al.  Efficient phrase querying with an auxiliary index , 2002, SIGIR '02.

[4]  R. Young,et al.  75.9 Euler’s Constant , 1991, The Mathematical Gazette.

[5]  S. Golomb Run-length encodings. , 1966 .

[6]  Johannes Fischer,et al.  Suffix Arrays on Words , 2007, CPM.

[7]  JUSTIN ZOBEL,et al.  Inverted files for text search engines , 2006, CSUR.

[8]  William F. Smyth,et al.  Inverted Files Versus Suffix Arrays for Locating Patterns in Primary Memory , 2006, SPIRE.

[9]  Carl Gutwin,et al.  Improving browsing in digital libraries with keyphrase indexes , 1999, Decis. Support Syst..

[10]  F. Burk Euler's Constant , 1985 .

[11]  Gonzalo Navarro,et al.  Compressed full-text indexes , 2007, CSUR.

[12]  Ingmar Weber,et al.  Output-sensitive autocompletion search , 2006, Information Retrieval.

[13]  J. Shane Culpepper,et al.  Compact Set Representation for Information Retrieval , 2007, SPIRE.

[14]  Alejandro López-Ortiz,et al.  Faster Adaptive Set Intersections for Text Searching , 2006, WEA.

[15]  Wai Lam,et al.  Efficient in-memory extensible inverted file , 2007, Inf. Syst..

[16]  Solomon W. Golomb,et al.  Run-length encodings (Corresp.) , 1966, IEEE Trans. Inf. Theory.

[17]  W. Bruce Croft,et al.  Supporting Full-Text Information Retrieval with a Persistent Object Store , 1994, EDBT.

[18]  Alistair Moffat,et al.  Storage Management for Files of Dynamic Records , 1993, Australian Database Conference.

[19]  Hugh E. Williams,et al.  Fast phrase querying with combined indexes , 2004, TOIS.

[20]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[21]  W. Bruce Croft,et al.  Efficient document retrieval in main memory , 2007, SIGIR.

[22]  Rodrigo González,et al.  Compressed Text Indexes with Fast Locate , 2007, CPM.

[23]  Peter Sanders,et al.  Intersection in Integer Inverted Indices , 2007, ALENEX.