Data Compression Using Adaptive Coding and Partial String Matching

The recently developed technique of arithmetic coding, in conjunction with a Markov model of the source, is a powerful method of data compression in situations where a linear treatment is inappropriate. Adaptive coding allows the model to be constructed dynamically by both encoder and decoder during the course of the transmission, and has been shown to incur a smaller coding overhead than explicit transmission of the model's statistics. But there is a basic conflict between the desire to use high-order Markov models and the need to have them formed quickly as the initial part of the message is sent. This paper describes how the conflict can be resolved with partial string matching, and reports experimental results which show that mixed-case English text can be coded in as little as 2.2 bits/ character with no prior knowledge of the source.

[1]  C. Harrison Experiments with linear prediction in television , 1952 .

[2]  H. E. White Printed english compression by dictionary encoding , 1967 .

[3]  Josef Raviv,et al.  Decision making in Markov chains applied to the problem of pattern recognition , 1967, IEEE Trans. Inf. Theory.

[4]  Thomas M. Cover,et al.  Enumerative source encoding , 1973, IEEE Trans. Inf. Theory.

[5]  Richard Clark Pasco,et al.  Source coding algorithms for fast data compression , 1976 .

[6]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[7]  Lalit R. Bahl,et al.  Recognition of continuously read natural corpus , 1978, ICASSP.

[8]  I. Witten APPROXIMATE, NON-DETERMINISTIC MODELLING OF BEHAVIOUR SEQUENCES , 1979 .

[9]  R. Hunter,et al.  International digital facsimile coding standards , 1980, Proceedings of the IEEE.

[10]  Mauro Guazzo,et al.  A general minimum-redundancy source-coding algorithm , 1980, IEEE Trans. Inf. Theory.

[11]  John Gerald Cleary,et al.  An associative and impressible computer , 1980 .

[12]  Glen G. Langdon,et al.  Universal modeling and coding , 1981, IEEE Trans. Inf. Theory.

[13]  Jorma Rissanen,et al.  Compression of Black-White Images with Arithmetic Coding , 1981, IEEE Trans. Commun..

[14]  Cliff B. Jones An efficient coding system for long source sequences , 1981, IEEE Trans. Inf. Theory.

[15]  Ian H. Witten,et al.  ARITHMETIC, ENUMERATIVE AND ADAPTIVE CODING , 1982 .

[16]  Michael Gates Roberts,et al.  Local order estimating Markovian analysis for noiseless source coding and authorship identification , 1982 .

[17]  John G. Cleary,et al.  COMPACT HASH TABLES , 1982 .

[18]  Glen G. Langdon,et al.  A note on the Ziv-Lempel model for compressing individual sequences , 1983, IEEE Trans. Inf. Theory.

[19]  Clerry,et al.  Compact Hash Tables Using Bidirectional Linear Probing , 1984, IEEE Trans. Computers.

[20]  Bjarne Stroustrup,et al.  C++ Programming Language , 1986, IEEE Softw..