Incremental frequency count—a post BWT-stage for the Burrows–Wheeler compression algorithm

The stage after the Burrows–Wheeler transform (BWT) has a key function inside the Burrows–Wheeler compression algorithm as it transforms the BWT output from a local context into a global context. This paper presents the Incremental Frequency Count stage, a post-BWT stage. The new stage is paired with a run length encoding stage between the BWT and the entropy coding stage of the algorithm. It offers high throughput similar to a Move To Front stage, and at the same time good compression rates like the strong but slow Weighted Frequency Count stage. The properties of the Incremental Frequency Count stage are compared to the Move To Front and Weighted Frequency Count stages by their compression rates and speeds on the Calgary and large Canterbury corpora. Copyright © 2006 John Wiley & Sons, Ltd.

[1]  Hozumi Tanaka,et al.  An efficient method for in memory construction of suffix arrays , 1999, 6th International Symposium on String Processing and Information Retrieval. 5th International Workshop on Groupware (Cat. No.PR00268).

[2]  Ian H. Witten,et al.  Data Compression Using Adaptive Coding and Partial String Matching , 1984, IEEE Trans. Commun..

[3]  Spyros S. Magliveras,et al.  Block sorting and compression , 1997, Proceedings DCC '97. Data Compression Conference.

[4]  Peter Sanders,et al.  Simple Linear Work Suffix Array Construction , 2003, ICALP.

[5]  Bernhard Balkenhol,et al.  Modifications of the Burrows and Wheeler data compression algorithm , 1999, Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096).

[6]  Sebastian Deorowicz Improvements to Burrows–Wheeler compression algorithm , 2000 .

[7]  Robert E. Tarjan,et al.  A Locally Adaptive Data , 1986 .

[8]  Nasir M. Rajpoot,et al.  Less redundant codes for variable size dictionaries , 2002, Proceedings DCC 2002. Data Compression Conference.

[9]  Peter M. Fenwick Burrows–Wheeler compression with variable length integer codes , 2002, Softw. Pract. Exp..

[10]  Jürgen Abel A fast and efficient post BWT-stage for the Burrows-Wheeler compression algorithm , 2005, Data Compression Conference.

[11]  P. Fenwick,et al.  Block Sorting Text Compression -- Final Report , 1996 .

[12]  Peter Sanders,et al.  Linear work suffix array construction , 2006, JACM.

[13]  Stephen R. Tate,et al.  Higher compression from the burrows-wheeler transform with new algorithms for the list update problem , 2000 .

[14]  Peter Elias,et al.  Universal codeword sets and representations of the integers , 1975, IEEE Trans. Inf. Theory.

[15]  Raffaele Giancarlo,et al.  Optimal Partitions of Strings: A New Class of Burrows-Wheeler Compression Algorithms , 2003, CPM.

[16]  Kunihiko Sadakane Unifying Text Search And Compression - Suffix Sorting, Block Sorting and Suffix Arrays , 2000 .

[17]  Julian Seward On the performance of BWT sorting algorithms , 2000, Proceedings DCC 2000. Data Compression Conference.

[18]  Kunihiko Sadakane,et al.  Faster suffix sorting , 2007, Theoretical Computer Science.

[19]  Sebastian Deorowicz,et al.  Second step algorithms in the Burrows–Wheeler compression algorithm , 2002, Softw. Pract. Exp..

[20]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .