Universal lossless source coding with the Burrows Wheeler transform

We here consider a theoretical evaluation of data compression algorithms based on the Burrows Wheeler transform (BWT). The main contributions include a variety of very simple new techniques for BWT-based universal lossless source coding on finite-memory sources and a set of new rate of convergence results for BWT-based source codes. The result is a theoretical validation and quantification of the earlier experimental observation that BWT-based lossless source codes give performance better than that of Ziv-Lempel-style codes and almost as good as that of prediction by partial mapping (PPM) algorithms.

[1]  John G. Cleary,et al.  Unbounded length contexts for PPM , 1995, Proceedings DCC '95 Data Compression Conference.

[2]  G. Seroussi,et al.  On tree sources, finite state machines, and time reversal , 1995, Proceedings of 1995 IEEE International Symposium on Information Theory.

[3]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[4]  Spyros S. Magliveras,et al.  Lexical Permutation Sorting Algorithm , 1997, Comput. J..

[5]  Peter Elias,et al.  Interval and recency rank source coding: Two on-line adaptive variable-length schemes , 1987, IEEE Trans. Inf. Theory.

[6]  Guy Louchard,et al.  Average redundancy rate of the Lempel-Ziv code , 1996, Proceedings of Data Compression Conference - DCC '96.

[7]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[8]  Michelle Effros,et al.  A vector quantization approach to universal noiseless coding and quantization , 1996, IEEE Trans. Inf. Theory.

[9]  Aaron D. Wyner,et al.  Improved redundancy of a version of the Lempel-Ziv algorithm , 1995, IEEE Trans. Inf. Theory.

[10]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[11]  Robert E. Tarjan,et al.  A Locally Adaptive Data , 1986 .

[12]  S. Kulkarni,et al.  Output distribution of the Burrows-Wheeler transform , 2000, 2000 IEEE International Symposium on Information Theory (Cat. No.00CH37060).

[13]  Lee D. Davisson,et al.  Universal noiseless coding , 1973, IEEE Trans. Inf. Theory.

[14]  Kunihiko Sadakane On optimality of variants of the block sorting compression , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[15]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[16]  JORMA RISSANEN,et al.  A universal data compression system , 1983, IEEE Trans. Inf. Theory.

[17]  Jorma Rissanen,et al.  Complexity of strings in the class of Markov sources , 1986, IEEE Trans. Inf. Theory.

[18]  Ian H. Witten,et al.  Data Compression Using Adaptive Coding and Partial String Matching , 1984, IEEE Trans. Commun..

[19]  Neri Merhav,et al.  Low-complexity sequential lossless coding for piecewise-stationary memoryless sources , 1998, IEEE Trans. Inf. Theory.

[20]  Hirosuke Yamamoto,et al.  Asymptotic Optimality of the Block Sorting Data Compression Algorithm (Special Section on Information Theory and Its Applications) , 1998 .

[21]  Raphail E. Krichevsky,et al.  The performance of universal encoding , 1981, IEEE Trans. Inf. Theory.

[22]  Sanjeev R. Kulkarni,et al.  Topics in the analysis of universal compression algorithms , 1999 .

[23]  Ziya Arnavut,et al.  Block sorting transformations , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[24]  Kunihiko Sadakane,et al.  A fast algorithm for making suffix arrays and for Burrows-Wheeler transformation , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[25]  Abraham Lempel,et al.  A sequential algorithm for the universal coding of finite memory sources , 1992, IEEE Trans. Inf. Theory.

[26]  Frans M. J. Willems,et al.  The context-tree weighting method: basic properties , 1995, IEEE Trans. Inf. Theory.

[27]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[28]  Neri Merhav,et al.  On the minimum description length principle for sources with piecewise constant parameters , 1993, IEEE Trans. Inf. Theory.

[29]  N. Jesper Larsson,et al.  The context trees of block sorting compression , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[30]  Meir Feder,et al.  A universal finite memory source , 1995, IEEE Trans. Inf. Theory.

[31]  Serap A. Savari,et al.  Redundancy of the Lempel-Ziv incremental parsing rule , 1997, IEEE Trans. Inf. Theory.

[32]  M. Schindler,et al.  A fast block-sorting algorithm for lossless data compression , 1997, Proceedings DCC '97. Data Compression Conference.

[33]  P. Fenwick Improvements to the Block Sorting Text Compression Algorithm , 1995 .

[34]  Lee D. Davisson,et al.  Minimax noiseless universal coding for Markov sources , 1983, IEEE Trans. Inf. Theory.

[35]  Kunihiko Sadakane Text compression using recency rank with context and relation to context sorting, block sorting and PPM/sup */ , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[36]  Jorma Rissanen,et al.  Universal coding, information, prediction, and estimation , 1984, IEEE Trans. Inf. Theory.

[37]  Stephen R. Tate,et al.  Higher compression from the Burrows-Wheeler transform by modified sorting , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[38]  Frans M. J. Willems,et al.  Coding for a binary independent piecewise-identically-distributed source , 1996, IEEE Trans. Inf. Theory.

[39]  Glen G. Langdon,et al.  Arithmetic Coding , 1979 .

[40]  M. Nelson Data compression with the Burrows-Wheeler Transform , 1996 .

[41]  Paul C. Shields,et al.  Universal redundancy rates do not exist , 1993, IEEE Trans. Inf. Theory.