On Universal Compression with Constant Random Access

In new applications of data compression, it is desired to have random access to any block of the compressed dataset (without the need to decompress the entire compressed sequence and thus accessing all the stored bits in memory). In this work, we analyze the problem of universal data compression with random access. Building on the work of Mazumdar, Chandar, and Wornell (2015), we discuss a systematic scheme to achieve close to optimal compression with finite random access. We first analyze the performance of the scheme for i.i.d sources. Using the gained intuition, for the more general class of Markov sources, we show the existence of finite random access compression schemes. Finally, we discuss a generic scheme which can be used to convert any universal compressor (e.g., Lempel-Ziv based schemes) into a finite random access universal compressor.

[1]  Mikel Hernaez,et al.  GTRAC: fast retrieval from compressed collections of genomic variants , 2016, Bioinform..

[2]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[3]  Muriel Médard,et al.  On locally decodable source coding , 2013, 2015 IEEE International Conference on Communications (ICC).

[4]  Zhen Zhang,et al.  The redundancy of source coding with a fidelity criterion: 1. Known statistics , 1997, IEEE Trans. Inf. Theory.

[5]  Paolo Ferragina,et al.  Indexing compressed text , 2005, JACM.

[6]  Gregory W. Wornell,et al.  Local recovery in data compression for general sources , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[7]  Thomas A. Courtade,et al.  Compressing sparse sequences under local decodability constraints , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[8]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[9]  Peter Bro Miltersen,et al.  Are bitvectors optimal? , 2000, STOC '00.