Burrows-Wheeler Transform

The Burrows-Wheeler transform is a technique used for the lossless compression of data. It is the algorithmic core of the tool bzip2 which has become a standard for the creation and distribution of compressed archives. Before the introduction of the Burrows-Wheeler transform, the field of lossless data compression was dominated by two approaches (see [2,21] for comprehensive surveys). The first approach dates back to the pioneering works of Shannon and Huffman, and it is based on the idea of using shorter codewords for the more frequent symbols. This idea has originated the techniques of Huffman and arithmetic coding and, more recently, the PPM (prediction by partial matching) family of compression algorithms. The second approach originated from the works of Lempel and Ziv and is based on the idea of adaptively building a dictionary and representing the input string as a concatenation of dictionary words. The best-known compressors based on this approach form the so-called ZIP-family; they have been the standard for several years and are available on essentially any computing platform (e.g., gzip, zip, winzip, just to cite a few). The Burrows-Wheeler transform introduced a completely new approach to lossless data compression based on the idea of transforming the input to make it easier to compress. In the authors’ words: “(this) technique [. . . ] works by applying a reversible transformation to a block of text to make redundancy in the input more accessible to simple coding schemes” [5, Sect. 7]. Not only has this technique produced some state-of-the-art compressors, but it also originated the field of compressed indexes [20] and it has been successfully extended to compress (and index) structured data such as XML files [11] and tables [22].

[1]  Fabrizio Luccio,et al.  Compressing and indexing labeled trees, with applications , 2009, JACM.

[2]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[3]  Antonio Restivo,et al.  An Extension of the Burrows Wheeler Transform and Applications to Sequence Comparison and Data Compression , 2005, CPM.

[4]  Joseph Gil,et al.  A Bijective String Sorting Transform , 2012, ArXiv.

[5]  Haim Kaplan,et al.  A simpler analysis of Burrows-Wheeler-based compression , 2007, Theor. Comput. Sci..

[6]  Robert E. Tarjan,et al.  A Locally Adaptive Data , 1986 .

[7]  Kiem-Phong Vo,et al.  Using column dependency to compress tables , 2004, Data Compression Conference, 2004. Proceedings. DCC 2004.

[8]  Travis Gagie,et al.  Move-to-Front, Distance Coding, and Inversion Frequencies revisited , 2010, Theor. Comput. Sci..

[9]  Peter M. Fenwick The Burrows-Wheeler Transform for Block Sorting Text Compression: Principles and Improvements , 1996, Comput. J..

[10]  Wing-Kai Hon,et al.  Breaking a Time-and-Space Barrier in Constructing Full-Text Indices , 2009, SIAM J. Comput..

[11]  Sen Zhang,et al.  Unifying the Burrows-Wheeler and the Schindler transforms , 2006, Data Compression Conference (DCC'06).

[12]  M. Schindler,et al.  A fast block-sorting algorithm for lossless data compression , 1997, Proceedings DCC '97. Data Compression Conference.

[13]  S. Kulkarni,et al.  Output distribution of the Burrows-Wheeler transform , 2000, 2000 IEEE International Symposium on Information Theory (Cat. No.00CH37060).

[14]  Ziya Arnavut Move-to-front and inversion coding , 2000, Proceedings DCC 2000. Data Compression Conference.

[15]  Ian H. Witten,et al.  Text Compression , 1990, 125 Problems in Text Algorithms.

[16]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[17]  Sanjeev R. Kulkarni,et al.  Universal lossless source coding with the Burrows Wheeler Transform , 2002, IEEE Trans. Inf. Theory.

[18]  Ian H. Witten,et al.  Data Compression Using Adaptive Coding and Partial String Matching , 1984, IEEE Trans. Commun..

[19]  Joong Chae Na Linear-Time Construction of Compressed Suffix Arrays Using o(n log n)-Bit Working Space for Large Alphabets , 2005, CPM.

[20]  David Salomon,et al.  Data Compression: The Complete Reference , 2006 .

[21]  Ziya Arnavut,et al.  Investigation of block-sorting of multiset permutations , 2004, Int. J. Comput. Math..

[22]  Raffaele Giancarlo,et al.  The Engineering of a Compression Boosting Library: Theory vs Practice in BWT Compression , 2006, ESA.

[23]  Ira M. Gessel,et al.  Counting Permutations with Given Cycle Structure and Descent Set , 1993, J. Comb. Theory A.

[24]  Travis Gagie,et al.  Lightweight Data Indexing and Compression in External Memory , 2009, Algorithmica.

[25]  Sen Zhang,et al.  Computing Inverse ST in Linear Complexity , 2008, CPM.

[26]  Jean Pierre Duval,et al.  Factorizing Words over an Ordered Alphabet , 1983, J. Algorithms.

[27]  Bernhard Balkenhol,et al.  Universal Data Compression Based on the Burrows-Wheeler Transformation: Theory and Practice , 2000, IEEE Trans. Computers.

[28]  Peter Sanders,et al.  Linear work suffix array construction , 2006, JACM.

[29]  Raffaele Giancarlo,et al.  The myriad virtues of Wavelet Trees , 2009, Inf. Comput..

[30]  Raffaele Giancarlo,et al.  Boosting textual compression in optimal linear time , 2005, JACM.

[31]  Joong Chae Na,et al.  Alphabet-independent linear-time construction of compressed suffix arrays using o(nlogn)-bit working space , 2007, Theor. Comput. Sci..

[32]  Fabrizio Luccio,et al.  Structuring labeled trees for optimal succinctness, and beyond , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[33]  Alistair Moffat,et al.  Can we do without ranks in Burrows Wheeler transform compression? , 2001, Proceedings DCC 2001. Data Compression Conference.

[34]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[35]  Giovanni Manzini,et al.  Opportunistic data structures with applications , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[36]  Gonzalo Navarro,et al.  An Alphabet-Friendly FM-Index , 2004, SPIRE.

[37]  Amar Mukherjee,et al.  Improving text compression ratios with the Burrows-Wheeler transform , 1999, Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096).

[38]  Gonzalo Navarro,et al.  Compressed full-text indexes , 2007, CSUR.

[39]  Juha Kärkkäinen,et al.  Fast BWT in small space by blockwise suffix sorting , 2007, Theor. Comput. Sci..

[40]  Maxime Crochemore,et al.  A note on the Burrows - CWheeler transformation , 2005, Theor. Comput. Sci..

[41]  Ronald Rosenfeld,et al.  Topic adaptation for language modeling using unnormalized exponential models , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[42]  Giovanni Manzini,et al.  An analysis of the Burrows-Wheeler transform , 2001, SODA '99.

[43]  Antonio Restivo,et al.  An extension of the Burrows-Wheeler Transform , 2007, Theor. Comput. Sci..

[44]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[45]  Kiem-Phong Vo,et al.  Compressing table data with column dependency , 2007, Theor. Comput. Sci..