Directly Addressable Variable-Length Codes

We introduce a symbol reordering technique that implicitly synchronizes variable-length codes, such that it is possible to directly access the i -th codeword without need of any sampling method. The technique is practical and has many applications to the representation of ordered sets, sparse bitmaps, partial sums, and compressed data structures for suffix trees, arrays, and inverted indexes, to name just a few. We show experimentally that the technique offers a competitive alternative to other data structures that handle this problem.

[1]  Hugh E. Williams,et al.  Compressing Integers for Fast File Access , 1999, Comput. J..

[2]  Gonzalo Navarro,et al.  Compressed full-text indexes , 2007, CSUR.

[3]  James A. Storer,et al.  Data Compression: Methods and Theory , 1987 .

[4]  Gonzalo Navarro,et al.  Lightweight natural language text compression , 2006, Information Retrieval.

[5]  Gonzalo Navarro,et al.  Practical Rank/Select Queries over Arbitrary Sequences , 2008, SPIRE.

[6]  J. Shane Culpepper,et al.  Compact Set Representation for Information Retrieval , 2007, SPIRE.

[7]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[8]  Guy Jacobson,et al.  Space-efficient static trees and graphs , 1989, 30th Annual Symposium on Foundations of Computer Science.

[9]  David Richard Clark,et al.  Compact pat trees , 1998 .

[10]  Rajeev Raman,et al.  Succinct indexable dictionaries with applications to encoding k-ary trees and multisets , 2002, SODA '02.

[11]  Paolo Ferragina,et al.  A simple storage scheme for strings achieving entropy bounds , 2007, SODA '07.

[12]  Alistair Moffat,et al.  Word‐based text compression , 1989, Softw. Pract. Exp..

[13]  Rodrigo González,et al.  Compressed text indexes: From theory to practice , 2007, JEAL.

[14]  Kunihiko Sadakane,et al.  New text indexing functionalities of the compressed suffix arrays , 2003, J. Algorithms.

[15]  Kunihiko Sadakane,et al.  Practical Entropy-Compressed Rank/Select Dictionary , 2006, ALENEX.

[16]  Ricardo A. Baeza-Yates,et al.  Fast and flexible word searching on compressed text , 2000, TOIS.

[17]  Wing-Kai Hon,et al.  Compressed Dictionaries: Space Measures, Data Sets, and Experiments , 2006, WEA.

[18]  Ian H. Witten,et al.  Text Compression , 1990, 125 Problems in Text Algorithms.

[19]  Gonzalo Navarro,et al.  An(other) Entropy-Bounded Compressed Suffix Tree , 2008, CPM.

[20]  Gonzalo Navarro,et al.  Dynamic entropy-compressed sequences and full-text indexes , 2006, TALG.

[21]  John L. Smith Tables , 1969, Neuromuscular Disorders.

[22]  Milan Ruzic,et al.  Uniform deterministic dictionaries , 2008, TALG.

[23]  Wojciech Rytter,et al.  Extracting Powers and Periods in a String from Its Runs Structure , 2010, SPIRE.

[24]  R. González,et al.  PRACTICAL IMPLEMENTATION OF RANK AND SELECT QUERIES , 2005 .

[25]  Ian H. Witten,et al.  Managing gigabytes , 1994 .

[26]  Faith Ellen,et al.  Optimal bounds for the predecessor problem , 1999, STOC '99.

[27]  Roberto Grossi,et al.  Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching , 2005, SIAM J. Comput..

[28]  Roberto Grossi,et al.  Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract) , 2000, STOC '00.

[29]  Alistair Moffat,et al.  Compression and Coding Algorithms , 2005, IEEE Trans. Inf. Theory.