Data compression using antidictionaries

We give a new text-compression scheme based on forbidden words ("antidictionary"). We prove that our algorithms attain the entropy for balanced binary sources. They run in linear time. Moreover, one of the main advantages of this approach is that it produces very fast decompressors. A second advantage is a synchronization property that is helpful to search compressed data and allows parallel compression. The techniques used in this paper are from information theory and finite automata.

[1]  Mark Nelson,et al.  The Data Compression Book , 2009 .

[2]  和達 三樹 G. L. Lamb, Jr.: Elements of Soliton Theory, John Wiley, New York and Chichester, 1980, xiii+289ページ, 24×17cm, 8,980円(Pure and Applied Mathematics; A Wiley-Interscience Series of Texts, Monographs and Tracts). , 1981 .

[3]  John G. Kemeny,et al.  Finite Markov Chains. , 1960 .

[4]  Rafail Krichevsky Universal Compression and Retrieval , 1994 .

[5]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[6]  Wojciech Rytter,et al.  Text Algorithms , 1994 .

[7]  Antonio Restivo,et al.  Minimal Forbidden Words and Factor Automata , 1998, MFCS.

[8]  Maxime Crochemore,et al.  On Compact Directed Acyclic Word Graphs , 1997, Structures in Logic and Computer Science.

[9]  A. Restivo,et al.  Text Compression Using Antidictionaries , 1999, ICALP.

[10]  Claude E. Shannon,et al.  Prediction and Entropy of Printed English , 1951 .

[11]  Antonio Restivo,et al.  Minimal Forbidden Words and Symbolic Dynamics , 1996, STACS.

[12]  R. Ellis,et al.  Entropy, large deviations, and statistical mechanics , 1985 .

[13]  Christian Choffrut,et al.  On extendibility of unavoidable sets , 1984, Discret. Appl. Math..

[14]  R. Gallager Information Theory and Reliable Communication , 1968 .

[15]  Antonio Restivo,et al.  Automata and Forbidden Words , 1998, Inf. Process. Lett..

[16]  Ian H. Witten,et al.  Text Compression , 1990, 125 Problems in Text Algorithms.

[17]  Dominique Perrin,et al.  Finite and infinite words , 2002 .

[18]  Maxime Crochemore,et al.  Automata for Matching Patterns , 1997, Handbook of Formal Languages.

[19]  John G. Kemeny,et al.  Finite Markov chains , 1960 .

[20]  Ayumi Shinohara,et al.  Pattern Matching in Text Compressed by Using Antidictionaries , 1999, CPM.

[21]  James A. Storer,et al.  Data Compression: Methods and Theory , 1987 .