A hybrid approach to text compression

Text compression schemes have sometimes been divided into two classes: symbolwise methods, which form a source model, typically using a finite context to predict symbols; and dictionary methods, which replace phrases (groups of symbols) in the input with a code. It is possible to decompose some dictionary methods into equivalent symbolwise methods. The decomposed method gives identical compression performance, but is slower because more coded symbols are transmitted. This decomposition is of interest primarily because it is helpful in making comparisons of the two methods. The authors explore a hybrid approach based on the opposite of this decomposition: the predictions of a symbolwise method are grouped together so that several characters can be coded at once. The objective is to combine the good compression of symbolwise methods with the high speed of dictionary methods. The hybrid allows tradeoffs to be made in terms of compression speed, compression performance, and memory usage. More importantly, investigating a hybrid method gives extra insights into the relationship between dictionary and symbolwise methods, and reveals that they are more closely related than might be expected.<<ETX>>

[1]  Ross N. Williams,et al.  An extremely fast Ziv-Lempel data compression algorithm , 1991, [1991] Proceedings. Data Compression Conference.

[2]  Edward R. Fiala,et al.  Data compression with finite windows , 1989, CACM.

[3]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[4]  Glen G. Langdon,et al.  A note on the Ziv-Lempel model for compressing individual sequences , 1983, IEEE Trans. Inf. Theory.

[5]  Andrzej Sieminski,et al.  Fast Decoding of the Huffman Codes , 1988, Inf. Process. Lett..

[6]  Ian H. Witten,et al.  The relationship between greedy parsing and symbolwise text compression , 1994, JACM.

[7]  Jukka Teuhola,et al.  Predictive test compression by hashing , 1987, SIGIR '87.

[8]  Peter M. Fenwick,et al.  Ziv-Lempel encoding with multi-bit flags , 1993, [Proceedings] DCC `93: Data Compression Conference.

[9]  Timothy Bell,et al.  A unifying theory and improvements for existing approaches to text compression , 1986 .

[10]  Ian H. Witten,et al.  Text Compression , 1990, 125 Problems in Text Algorithms.