Data compression with finite windows

Several methods are presented for adaptive, invertible data compression in the style of Lempel's and Ziv's first textual substitution proposal. For the first two methods, the article describes modifications of McCreight's suffix tree data structure that support cyclic maintenance of a window on the most recent source characters. A percolating update is used to keep node positions within the window, and the updating process is shown to have constant amortized cost. Other methods explore the tradeoffs between compression time, expansion time, data structure size, and amount of compression achieved. The article includes a graph-theoretic analysis of the compression penalty incurred by our codeword selection policy in comparison with an optimal policy, and it includes empirical studies of the performance of various adaptive compressors from the literature.

[1]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[2]  D. Huffman A Method for the Construction of Minimum-Redundancy Codes , 1952 .

[3]  G. Basharin On a Statistical Estimate for the Entropy of a Sequence of Independent Random Variables , 1959 .

[4]  Journal of the Association for Computing Machinery , 1961, Nature.

[5]  Norman Abramson,et al.  Information theory and coding , 1963 .

[6]  Solomon W. Golomb,et al.  Run-length encodings (Corresp.) , 1966, IEEE Trans. Inf. Theory.

[7]  S. Golomb Run-length encodings. , 1966 .

[8]  G. Dantzig,et al.  FINDING A CYCLE IN A GRAPH WITH MINIMUM COST TO TIME RATIO WITH APPLICATION TO A SHIP ROUTING PROBLEM , 1966 .

[9]  Donald R. Morrison,et al.  PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric , 1968, J. ACM.

[10]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[11]  M. V. Wilkes,et al.  The Art of Computer Programming, Volume 3, Sorting and Searching , 1974 .

[12]  Peter Elias,et al.  Universal codeword sets and representations of the integers , 1975, IEEE Trans. Inf. Theory.

[13]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[14]  Richard Clark Pasco,et al.  Source coding algorithms for fast data compression , 1976 .

[15]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[16]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[17]  Robert G. Gallager,et al.  Variations on a theme by Huffman , 1978, IEEE Trans. Inf. Theory.

[18]  Jacob Ziv,et al.  Coding theorems for individual sequences , 1978, IEEE Trans. Inf. Theory.

[19]  Michael Rodeh,et al.  Economical encoding of commas between strings , 1978, CACM.

[20]  Glen G. Langdon,et al.  Arithmetic Coding , 1979 .

[21]  R. Hunter,et al.  International digital facsimile coding standards , 1980, Proceedings of the IEEE.

[22]  Mauro Guazzo,et al.  A general minimum-redundancy source-coding algorithm , 1980, IEEE Trans. Inf. Theory.

[23]  Glen G. Langdon,et al.  Universal modeling and coding , 1981, IEEE Trans. Inf. Theory.

[24]  Jorma Rissanen,et al.  Compression of Black-White Images with Arithmetic Coding , 1981, IEEE Trans. Commun..

[25]  Cliff B. Jones An efficient coding system for long source sequences , 1981, IEEE Trans. Inf. Theory.

[26]  Michael Rodeh,et al.  Linear Algorithm for Data Compression via String Matching , 1981, JACM.

[27]  James A. Storer,et al.  Data compression via textual substitution , 1982, JACM.

[28]  John Hobby,et al.  Using string matching to compress Chinese characters , 1982 .

[29]  Glen G. Langdon,et al.  A note on the Ziv-Lempel model for compressing individual sequences , 1983, IEEE Trans. Inf. Theory.

[30]  J. Rissanen,et al.  A Double-Adaptive File Compression Algorithm , 1983, IEEE Trans. Commun..

[31]  Terry A. Welch,et al.  A Technique for High-Performance Data Compression , 1984, Computer.

[32]  Ian H. Witten,et al.  Data Compression Using Adaptive Coding and Partial String Matching , 1984, IEEE Trans. Commun..

[33]  Jeffrey Scott Vitter,et al.  Design and analysis of dynamic Huffman coding , 1985, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).

[34]  Mark N. Wegman,et al.  Variations on a theme by Ziv and Lempel , 1985 .

[35]  Matti Jakobsson,et al.  Compression of character strings by an adaptive dictionary , 1985, BIT.

[36]  Donald E. Knuth,et al.  Dynamic Huffman Coding , 1985, J. Algorithms.

[37]  Robert E. Tarjan,et al.  A Locally Adaptive Data , 1986 .

[38]  T. Bell,et al.  Better OPM/L Text Compression , 1986, IEEE Trans. Commun..

[39]  R. Nigel Horspool,et al.  Data Compression Using Dynamic Markov Modelling , 1987, Comput. J..

[40]  Ian H. Witten,et al.  Arithmetic coding for data compression , 1987, CACM.

[41]  Ian H. Witten,et al.  Text Compression , 1990, 125 Problems in Text Algorithms.