Length of minimal forbidden words on a stationary ergodic source

An antidictionary is in particular useful for data compression, and it consists of minimal forbidden words for a given string. We derive the average length Mn of minimal forbidden words in strings of length n under a stationary ergodic source with entropy H which takes values on a finite alphabet. For the string length n, we prove, log n/Mn = H, in probability, as n ↑ ∞. We use the Wyner-Ziv result, with respect to connection between entropy and recurrence-time for ergodic processes, to prove the theorem. Its validity is shown by simulation results on a memoryless binary information source.

[1]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[2]  Hiroyoshi Morita,et al.  A tight upper bound on the size of the antidictionary of a binary string , 2005 .

[3]  H. Morita,et al.  On the sliding window variations of antidictionary data compression using dynamic suffix trees , 2008, 2008 International Symposium on Information Theory and Its Applications.

[4]  Hiroyoshi Morita,et al.  On-line Electrocardiogram Lossless Compression Using Antidictionary-Based Methods , 2009 .

[5]  Ota Takahiro,et al.  Branch Prediction Based on Antidictionary Tree , 2007 .

[6]  Aaron D. Wyner,et al.  Some asymptotic properties of the entropy of a stationary ergodic data source with applications to data compression , 1989, IEEE Trans. Inf. Theory.

[7]  Mikihiko Nishiara,et al.  On Construction of Reversible Variable-Length Codes Including Resynchronization Markers as Codewords , 2006 .

[8]  Julien Fayolle,et al.  Compression de données sans perte et combinatoire analytique , 2006 .

[9]  Jan Holub,et al.  DCA Using Suffix Arrays , 2008, Data Compression Conference (dcc 2008).

[10]  Alistair Moffat,et al.  Compression and Coding Algorithms , 2005, IEEE Trans. Inf. Theory.

[11]  A. Restivo,et al.  Data compression using antidictionaries , 2000, Proceedings of the IEEE.

[12]  Antonio Restivo,et al.  Automata and Forbidden Words , 1998, Inf. Process. Lett..

[13]  Julien Fayolle Analysis of the Size of Antidictionary in , 2008, CPM.

[14]  Hiroyoshi Morita,et al.  On the On-line Arithmetic Coding Based on Antidictionaries with Linear Complexity , 2007, 2007 IEEE International Symposium on Information Theory.

[15]  Antonio Restivo,et al.  Word assembly through minimal forbidden words , 2006, Theor. Comput. Sci..

[16]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[17]  Chiara Epifanio,et al.  A Trie-Based Approach for Compacting Automata , 2004, CPM.

[18]  Hiroyoshi Morita,et al.  On the Construction of an Antidictionary with Linear Complexity Using the Suffix Tree , 2007, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..