Optimal Partitions of Strings: A New Class of Burrows-Wheeler Compression Algorithms

The Burrows-Wheeler transform [1] is one of the mainstays of lossless data compression. In most cases, its output is fed to Move to Front or other variations of symbol ranking compression. One of the main open problems [2] is to establish whether Move to Front, or more in general symbol ranking compression, is an essential part of the compression process. We settle this question positively by providing a new class of Burrows-Wheeler algorithms that use optimal partitions of strings, rather than symbol ranking, for the additional step. Our technique is a quite surprising specialization to strings of partitioning techniques devised by Buchsbaum et al. [3] for two-dimensional table compression. Following Manzini [4], we analyze two algorithms in the new class, in terms of the k-th order empirical entropy of a string and, for both algorithms, we obtain better compression guarantees than the ones reported in [4] for Burrows-Wheeler algorithms that use Move to Front.

[1]  Peter M. Fenwick The Burrows-Wheeler Transform for Block Sorting Text Compression: Principles and Improvements , 1996, Comput. J..

[2]  Peter Elias,et al.  Universal codeword sets and representations of the integers , 1975, IEEE Trans. Inf. Theory.

[3]  R. Nigel Horspool,et al.  Data Compression Using Dynamic Markov Modelling , 1987, Comput. J..

[4]  Alistair Moffat,et al.  Implementing the PPM data compression scheme , 1990, IEEE Trans. Commun..

[5]  Robert E. Tarjan,et al.  A Locally Adaptive Data , 1986 .

[6]  Kunihiko Sadakane On optimality of variants of the block sorting compression , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[7]  Raffaele Giancarlo,et al.  Improving table compression with combinatorial optimization , 2002, SODA '02.

[8]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[9]  Kenneth Ward Church,et al.  Engineering the compression of massive tables: an experimental approach , 2000, SODA '00.

[10]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[11]  Spyros S. Magliveras,et al.  Block sorting and compression , 1997, Proceedings DCC '97. Data Compression Conference.

[12]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[13]  Inder Jeet Taneja,et al.  Bounds on the redundancy of Huffman codes , 1986, IEEE Trans. Inf. Theory.

[14]  Giovanni Manzini,et al.  An analysis of the Burrows-Wheeler transform , 2001, SODA '99.

[15]  Alistair Moffat,et al.  Can we do without ranks in Burrows Wheeler transform compression? , 2001, Proceedings DCC 2001. Data Compression Conference.

[16]  Michelle Effros,et al.  Universal lossless source coding with the Burrows Wheeler transform , 1999, Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096).

[17]  John G. Cleary,et al.  Unbounded Length Contexts for PPM , 1997 .