An analysis of the Burrows-Wheeler transform

The Burrows—Wheeler Transform (also known as Block-Sorting) is at the base of compression algorithms that are the state of the art in lossless data compression. In this paper, we analyze two algorithms that use this technique. The first one is the original algorithm described by Burrows and Wheeler, which, despite its simplicity outperforms the Gzip compressor. The second one uses an additional run-length encoding step to improve compression. We prove that the compression ratio of both algorithms can be bounded in terms of the kth order empirical entropy of the input string for any k ≥ 0. We make no assumptions on the input and we obtain bounds which hold in the worst case that is for every possible input string. All previous results for Block-Sorting algorithms were concerned with the average compression ratio and have been established assuming that the input comes from a finite-order Markov source.

[1]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[2]  Kunihiko Sadakane On optimality of variants of the block sorting compression , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[3]  Jeffrey Scott Vitter,et al.  Analysis of arithmetic coding for data compression , 1991, [1991] Proceedings. Data Compression Conference.

[4]  D. Huffman A Method for the Construction of Minimum-Redundancy Codes , 1952 .

[5]  Giovanni Manzini,et al.  Compression of Low Entropy Strings with Lempel-Ziv Algorithms , 1999, SIAM J. Comput..

[6]  Jeffrey Scott Vitter,et al.  Analysis of arithmetic coding for data compression , 1991, [1991] Proceedings. Data Compression Conference.

[7]  P. Fenwick,et al.  Block Sorting Text Compression -- Final Report , 1996 .

[8]  Michelle Effros,et al.  Universal lossless source coding with the Burrows Wheeler transform , 1999, Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096).

[9]  Giovanni Manzini,et al.  Opportunistic data structures with applications , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[10]  Alistair Moffat,et al.  Implementing the PPM data compression scheme , 1990, IEEE Trans. Commun..

[11]  Robert E. Tarjan,et al.  A Locally Adaptive Data , 1986 .

[12]  Ian H. Witten,et al.  Arithmetic coding for data compression , 1987, CACM.

[13]  John G. Cleary,et al.  Unbounded length contexts for PPM , 1995, Proceedings DCC '95 Data Compression Conference.

[14]  Giovanni Manzini,et al.  An experimental study of an opportunistic index , 2001, SODA '01.

[15]  Ian H. Witten,et al.  Arithmetic coding revisited , 1998, TOIS.

[16]  M. Nelson Data compression with the Burrows-Wheeler Transform , 1996 .

[17]  N. Jesper Larsson,et al.  The context trees of block sorting compression , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[18]  Kunihiko Sadakane Text compression using recency rank with context and relation to context sorting, block sorting and PPM/sup */ , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[19]  M. Schindler,et al.  A fast block-sorting algorithm for lossless data compression , 1997, Proceedings DCC '97. Data Compression Conference.

[20]  Ian H. Witten,et al.  Arithmetic coding revisited , 1995, Proceedings DCC '95 Data Compression Conference.

[21]  Peter M. Fenwick The Burrows-Wheeler Transform for Block Sorting Text Compression: Principles and Improvements , 1996, Comput. J..

[22]  David S. Wheeler,et al.  An implementation of block coding , 1995 .

[23]  R. Nigel Horspool,et al.  Data Compression Using Dynamic Markov Modelling , 1987, Comput. J..

[24]  J. Vitter,et al.  Practical Implementations of Arithmetic Coding , 1991 .

[25]  Jeffrey Scott Vitter,et al.  Design and analysis of dynamic Huffman codes , 1987, JACM.

[26]  E TarjanRobert,et al.  A locally adaptive data compression scheme , 1986 .