6 – Context-Based Compression

Publisher Summary This chapter provides a number of techniques that use minimal prior assumptions about the statistics of data and briefly discusses context-based algorithm. The best-known context-based algorithm is the “ppm algorithm,” first proposed by Cleary and Witten. The idea of the ppm algorithm is elegantly simple. The basic algorithm initially attempts to use the largest context. The size of the largest context is predetermined. If the symbol to be encoded has not previously been encountered in this context, an escape symbol is encoded and the algorithm attempts to use the next smaller context. If the symbol has not occurred in this context either, the size of the context is further reduced. This process continues until either one obtains a context that has previously been encountered with this symbol or one arrives at the conclusion that the symbol has not been encountered previously in any context. The chapter proposes arithmetic coding and the basic idea behind arithmetic coding is the division of the unit interval into subintervals, each of which represents a particular letter. The smaller the subinterval, the more bits are required to distinguish it from other subintervals. If one can reduce the number of symbols to be represented, the number of subintervals goes down as well. This in turn means that the sizes of the subintervals increase, leading to a reduction in the number of bits required for encoding. The exclusion principle used in ppm provides this kind of reduction in rate.