The Burrows-Wheeler Transform for Block Sorting Text Compression: Principles and Improvements

A recent development in text compression is a 'block sorting' algorithm which permutes the input text according to a special sort procedure and then processes the permuted text with Move-ToFront (MTF) and a final statistical compressor. The technique combines good speed with excellent compression performance. This paper investigates the fundamental operation of the algorithm and presents some improvements based on that analysis. Although block sorting is clearly related to previous compression techniques, it appears that it is best described by techniques derived from work by Shannon on the prediction and entropy of English text. A simple model is developed which relates the compression to the proportion of zeros after the MTF stage.

[1]  Hidetoshi Yokoo An adaptive data compression method based on context sorting , 1996, Proceedings of Data Compression Conference - DCC '96.

[2]  Spyros S. Magliveras,et al.  Block sorting and compression , 1997, Proceedings DCC '97. Data Compression Conference.

[3]  John G. Cleary,et al.  Unbounded Length Contexts for PPM , 1997 .

[4]  Daniel S. Hirschberg,et al.  Streamlining context models for data compression , 1991, [1991] Proceedings. Data Compression Conference.

[5]  Jeffrey Scott Vitter,et al.  Design and Analysis of Fast Text Compression Based on Quasi-Arithmetic Coding , 1994, Inf. Process. Manag..

[6]  Claude E. Shannon,et al.  Prediction and Entropy of Printed English , 1951 .

[7]  M. Nelson Data compression with the Burrows-Wheeler Transform , 1996 .

[8]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[9]  Charles Bloom,et al.  LZP: a new data compression algorithm , 1996, Proceedings of Data Compression Conference - DCC '96.

[10]  Alistair Moffat,et al.  Implementing the PPM data compression scheme , 1990, IEEE Trans. Commun..