On optimality of variants of the block sorting compression

Summary form only given. Block sorting uses the Burrows-Wheeler transformation (BWT) which permutes an input string. The permutation is defined by the lexicographic order of contexts of symbols. If we assume that symbol probability is defined by preceding k symbols called context, symbols whose contexts are the same are collected in consecutive regions after the BWT. Sadakane (1997) proposed a variant of the block sorting and it is asymptotically optimal for any finite-order Markov source if permutation of symbols whose contexts are the same is random. However, the variant encodes 1 symbols as a block and therefore it is not practical because 1 is large. We propose two compression schemes not using blocks but encoding symbols one by one by using arithmetic codes. The move-to-front transformation is not used. The former encodes symbols by different codes defined by symbol frequencies in contexts. It is asymptotically optimal for k-th order Markov sources. However, it is available only if the order k of the source is already known. The latter divides the permuted string into many parts and encodes symbols using different arithmetic codes by the parts. Each part, has symbols whose contexts are the same. If the permutation is random, the scheme is asymptotically optimal for any finite-order Markov source. The permutation in the BWT is not completely random. However, we conjecture that the permuted string is memoryless and our schemes work.

[1]  Hidetoshi Yokoo An adaptive data compression method based on context sorting , 1996, Proceedings of Data Compression Conference - DCC '96.

[2]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[3]  Kunihiko Sadakane Text compression using recency rank with context and relation to context sorting, block sorting and PPM/sup */ , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).