Compression and Entropy

The connection between text compression and the measure of entropy of a source seems to be well known but poorly documented. We try to partially remedy this situation by showing that the topological entropy is a lower bound for the compression ratio of any compressor. We show that for factorial sources the 1978 version of the Ziv-Lempel compression algorithm achieves this lower bound.

[1]  Donald E. Knuth The art of computer programming: fundamental algorithms , 1969 .

[2]  James A. Storer,et al.  Data Compression: Methods and Theory , 1987 .

[3]  Georges Hansel Estimation of the Entropy by the Lempel-Ziv Method , 1987, Electronic Dictionaries and Automata in Computational Linguistics.

[4]  D. Ornstein,et al.  Universal Almost Sure Data Compression , 1990 .

[5]  A. J. Sarantakis,et al.  Entropy and data compression , 1970 .

[6]  P. Jones,et al.  A Diary on Information Theory , 1989 .

[7]  Werner Kuich,et al.  On the Entropy of Context-Free Languages , 1970, Inf. Control..

[8]  N. Martin,et al.  Mathematical Theory of Entropy , 1981 .

[9]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[10]  Aleksandr Yakovlevich Khinchin,et al.  Mathematical foundations of information theory , 1959 .

[11]  Maurice Gross,et al.  Electronic Dictionaries and Automata in Computational Linguistics , 1987, Lecture Notes in Computer Science.

[12]  George A. Miller,et al.  Finite State Languages , 1958, Inf. Control..

[13]  Ian H. Witten,et al.  Text Compression , 1990, 125 Problems in Text Algorithms.

[14]  Andrew V. Goldberg,et al.  Compression and Ranking , 1991, SIAM J. Comput..

[15]  Aldo de Luca On the entropy of a formal language , 1975, Automata Theory and Formal Languages.

[16]  Benjamin Weiss,et al.  Entropy and data compression schemes , 1993, IEEE Trans. Inf. Theory.