A Simple Online Competitive Adaptation of Lempel-Ziv Compression with Efficient Random Access Support

We present a simple adaptation of the Lempel Ziv 78' (LZ78) compression scheme that supports efficient random access to the input string. The compression algorithm is given as input a parameter ε > 0, and with very high probability increases the length of the compressed string by at most a factor of (1 + ε). The access time is O(log n + 1/ε2) in expectation, and O(log n/ε2) with high probability. The scheme relies on sparse transitive-closure spanners. Any (consecutive) substring of the input string can be retrieved at an additional additive cost in the running time of the length of the substring. The main benefit of the proposed scheme is that it preserves the online nature and simplicity of LZ78, and that for every input string, the length of the compressed string is only a small factor larger than that obtained by running LZ78.

[1]  Devavrat Shah,et al.  A locally encodable and decodable compressed data structure , 2009, 2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[2]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[4]  Rodrigo González,et al.  Statistical Encoding of Succinct Data Structures , 2006, CPM.

[5]  Mikkel Thorup,et al.  Changing base without losing space , 2010, STOC '10.

[6]  Paolo Ferragina,et al.  A simple storage scheme for strings achieving entropy bounds , 2007, SODA '07.

[7]  D. Huffman A Method for the Construction of Minimum-Redundancy Codes , 1952 .

[8]  Roberto Grossi,et al.  Squeezing succinct data structures into entropy bounds , 2006, SODA '06.

[9]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[10]  J. Ian Munro,et al.  A Uniform Approach Towards Succinct Representation of Trees , 2008, SWAT.

[11]  Ian H. Witten,et al.  Arithmetic coding for data compression , 1987, CACM.

[12]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[13]  Wing-Kai Hon,et al.  Compressed data structures: dictionaries and data-aware measures , 2006, Data Compression Conference (DCC'06).

[14]  Gonzalo Navarro,et al.  LZ77-Like Compression with Fast Random Access , 2010, 2010 Data Compression Conference.

[15]  Kyomin Jung,et al.  Transitive-Closure Spanners , 2008, SIAM J. Comput..

[16]  Kimmo Fredriksson,et al.  Simple Random Access Compression , 2009, Fundam. Informaticae.