On a universal antidictionary coding for stationary ergodic sources with finite alphabet

This paper shows that a two-pass universal antidictionary coding method is asymptotically optimal for stationary ergodic sources with a finite alphabet. To prove the results, we propose a new compact tree representation of an antidictionary. We also extend the lossless compression algorithm proposed by Dubé and Beaudoin, called CSE, from the binary alphabet to the q-ary alphabet (q ≥ 2), which we utilize to efficiently compress the antidictionary of an input sequence to be encoded.

[1]  Vincent Beaudoin,et al.  Lossless Data Compression via Substring Enumeration , 2010, 2010 Data Compression Conference.

[2]  Hiroyoshi Morita,et al.  Asymptotic optimality of antidictionary codes , 2010, 2010 IEEE International Symposium on Information Theory.

[3]  Hiroyoshi Morita,et al.  On the adaptive antidictionary code using minimal forbidden words with constant lengths , 2010, 2010 International Symposium On Information Theory & Its Applications.

[4]  P. Shields The Ergodic Theory of Discrete Sample Paths , 1996 .

[5]  M. Lothaire,et al.  Applied Combinatorics on Words , 2005 .

[6]  A. Restivo,et al.  Data compression using antidictionaries , 2000, Proceedings of the IEEE.

[7]  Ken-ichi Iwata,et al.  On the maximum redundancy of CSE for I.I.D. sources , 2012, 2012 International Symposium on Information Theory and its Applications.

[8]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[9]  Peter Elias,et al.  Universal codeword sets and representations of the integers , 1975, IEEE Trans. Inf. Theory.

[10]  Hidetoshi Yokoo Asymptotic Optimal Lossless Compression via the CSE Technique , 2011, 2011 First International Conference on Data Compression, Communications and Processing.

[11]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[12]  Hiroyoshi Morita,et al.  On antidictionary coding based on compacted substring automaton , 2013, 2013 IEEE International Symposium on Information Theory.

[13]  Jan Holub,et al.  DCA Using Suffix Arrays , 2008, Data Compression Conference (dcc 2008).

[14]  Hidetoshi Yokoo,et al.  The universality and linearity of compression by substring enumeration , 2011, 2011 IEEE International Symposium on Information Theory Proceedings.

[15]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.