The context trees of block sorting compression

The Burrows-Wheeler (1994) transform (BWT) and block sorting compression are closely related to the context trees of PPM. The usual approach of treating BWT as merely a permutation is not able to fully exploit this relation. We show that an explicit context tree for BWT can be efficiently generated by taking a subset of the corresponding suffix tree, identifying the central problems in exploiting its structure, and tracing the influence of the context tree on the common move-to-front schemes. We experimentally obtain limits for compression using the constructed trees, and, as an attempt at utilizing the full context tree, present a compression scheme that represents the context tree explicitly as part of the compressed data. We argue that a conscious treatment of the context tree should be able to achieve the full compression performance of PPM while maintaining the computational efficiency of BWT. Thus, BWT with explicit context trees is a strong candidate for powerful general compression, especially for large data files.