Higher compression from the burrows-wheeler transform with new algorithms for the list update problem

Burrows-Wheeler compression is a three stage process in which the data is transformed with the Burrows-Wheeler Transform, then transformed with Move-To-Front, and finally encoded with an entropy coder. Move-To-Front, Transpose, and Frequency Count are some of the many algorithms used on the List Update problem. In 1985, Competitive Analysis first showed the superiority of Move-To-Front over Transpose and Frequency Count for the List Update problem with arbitrary data. Earlier studies due to Bitner assumed independent identically distributed data, and showed that while Move-To-Front adapts to a distribution faster, incurring less overwork; the asymptotic costs of Frequency Count and Transpose are less. The improvements to Burrows-Wheeler compression this work covers are increases in the amount, not speed, of compression. Best x of 2x−1 is a new family of algorithms created to improve on Move-To-Front's processing of the output of the Burrows-Wheeler Transform which is like piecewise independent identically distributed data. Other algorithms for both the middle stage of Burrows-Wheeler compression and the List Update problem for which overwork, asymptotic cost, and competitive ratios are also analyzed are several variations of Move One From Front and part of the randomized algorithm Timestamp. The Best x of 2x−1 family includes Move-To-Front, the part of Timestamp of interest, and Frequency Count. Lastly, a greedy choosing scheme, Snake, switches back and forth as the amount of compression that two List Update algorithms achieves fluctuates, to increase overall compression. The Burrows-Wheeler Transform is based on sorting of contexts. The other improvements are better sorting orders, such as “aeioubcdf…” instead of standard alphabetical “abcdefghi…” on English text data, and an algorithm for computing orders for any data, and Gray code sorting instead of standard sorting. Both techniques lessen the overwork incurred by whatever List Update algorithms are used by reducing the difference between adjacent sorted contexts.

[1]  Sandy Irani,et al.  Two Results on the List Update Problem , 1991, Inf. Process. Lett..

[2]  Stephen R. Tate,et al.  Higher compression from the Burrows-Wheeler transform by modified sorting , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[3]  Frans M. J. Willems,et al.  Coding for a binary independent piecewise-identically-distributed source , 1996, IEEE Trans. Inf. Theory.

[4]  M. Nelson Data compression with the Burrows-Wheeler Transform , 1996 .

[5]  Robert Sedgewick,et al.  Fast algorithms for sorting and searching strings , 1997, SODA '97.

[6]  Bernhard Balkenhol,et al.  Modifications of the Burrows and Wheeler data compression algorithm , 1999, Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096).

[7]  Susanne Albers,et al.  Improved randomized on-line algorithms for the list update problem , 1995, SODA '95.

[8]  Guy Louchard,et al.  Average redundancy rate of the Lempel-Ziv code , 1996, Proceedings of Data Compression Conference - DCC '96.

[9]  Frans M. J. Willems,et al.  Switching between two universal source coding algorithms , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[10]  C. Sidney Burrus,et al.  Waveform and image compression using the Burrows Wheeler transform and the wavelet transform , 1997, Proceedings of International Conference on Image Processing.

[11]  Dana S. Richards Data Compression and Gray-Code Sorting , 1986, Inf. Process. Lett..

[12]  Julian Seward On the performance of BWT sorting algorithms , 2000, Proceedings DCC 2000. Data Compression Conference.

[13]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[14]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[15]  Brenton Chapin Switching between two on-line list update algorithms for higher compression of Burrows-Wheeler transformed data , 2000, Proceedings DCC 2000. Data Compression Conference.

[16]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[17]  Boris Teia,et al.  A Lower Bound for Randomized List Update Algorithms , 1993, Inf. Process. Lett..

[18]  Michelle Effros PPM performance with BWT complexity: a new method for lossless data compression , 2000, Proceedings DCC 2000. Data Compression Conference.

[19]  Neri Merhav,et al.  On the minimum description length principle for sources with piecewise constant parameters , 1993, IEEE Trans. Inf. Theory.

[20]  N. Jesper Larsson,et al.  The context trees of block sorting compression , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[21]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[22]  Allan Borodin,et al.  Online computation and competitive analysis , 1998 .

[23]  Glen G. Langdon,et al.  An Introduction to Arithmetic Coding , 1984, IBM J. Res. Dev..

[24]  Ian H. Witten,et al.  Data Compression Using Adaptive Coding and Partial String Matching , 1984, IEEE Trans. Commun..

[25]  Daniel J. Rosenkrantz,et al.  An Analysis of Several Heuristics for the Traveling Salesman Problem , 1977, SIAM J. Comput..

[26]  Michelle Effros,et al.  Universal lossless source coding with the Burrows Wheeler transform , 1999, Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096).

[27]  Bernhard Balkenhol,et al.  Universal Data Compression Based on the Burrows-Wheeler Transformation: Theory and Practice , 2000, IEEE Trans. Computers.

[28]  Kunihiko Sadakane,et al.  A fast algorithm for making suffix arrays and for Burrows-Wheeler transformation , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[29]  A. Mukherjee,et al.  Preprocessing text to improve compression ratios , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[30]  Peter Elias,et al.  Interval and recency rank source coding: Two on-line adaptive variable-length schemes , 1987, IEEE Trans. Inf. Theory.

[31]  P. Fenwick,et al.  Block Sorting Text Compression -- Final Report , 1996 .

[32]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[33]  Robert E. Tarjan,et al.  Amortized efficiency of list update and paging rules , 1985, CACM.

[34]  Robert E. Tarjan,et al.  A Locally Adaptive Data , 1986 .

[35]  M. Schindler,et al.  A fast block-sorting algorithm for lossless data compression , 1997, Proceedings DCC '97. Data Compression Conference.

[36]  Ziya Arnavut Move-to-front and inversion coding , 2000, Proceedings DCC 2000. Data Compression Conference.

[37]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[38]  Arne Andersson,et al.  A new efficient radix sort , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[39]  Ran El-Yaniv,et al.  Online list accessing algorithms and their applications: recent empirical evidence , 1997, SODA '97.

[40]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[41]  Susanne Albers,et al.  Average Case Analyses of List Update Algorithms, with Applications to Data Compression , 1996, Algorithmica.

[42]  James R. Bitner,et al.  Heuristics That Dynamically Organize Data Structures , 1979, SIAM J. Comput..

[43]  Giovanni Manzini,et al.  An analysis of the Burrows-Wheeler transform , 2001, SODA '99.

[44]  Ronald L. Rivest,et al.  On self-organizing sequential search heuristics , 1976, CACM.