The universality of grammar-based codes for sources with countably infinite alphabets

In this paper, we investigate the performance of grammar-based codes for sources with countably infinite alphabets. Let /spl Lambda/ denote an arbitrary class of stationary, ergodic sources with a countably infinite alphabet. It is shown that grammar-based codes can be modified so that they are universal with respect to any /spl Lambda/ if and only if there exists a universal code for /spl Lambda/. Moreover, upper bounds on the worst case redundancies of grammar-based codes among large sets of length-n individual sequences from a countably infinite alphabet are established. Depending upon the conditions satisfied by length-n individual sequences, these bounds range from O(loglogn/logn) to O(1/log/sup 1-/spl alpha//n) for some 0</spl alpha/<1. These results complement the previous universality and redundancy results in the literature on the performance of grammar-based codes for sources with finite alphabets.

[1]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[2]  László Györfi,et al.  On Universal Noiseless Source Coding for Infinite Source Alphabets , 1993, Eur. Trans. Telecommun..

[3]  László Györfi,et al.  There is no universal source code for an infinite source alphabet , 1994, IEEE Trans. Inf. Theory.

[4]  Abhi Shelat,et al.  The smallest grammar problem , 2005, IEEE Transactions on Information Theory.

[5]  John C. Kieffer,et al.  A unified approach to weak universal source coding , 1978, IEEE Trans. Inf. Theory.

[6]  Dake He,et al.  Efficient universal lossless data compression algorithms based on a greedy sequential grammar transform .2. With context models , 2000, IEEE Trans. Inf. Theory.

[7]  Peter Elias,et al.  Universal codeword sets and representations of the integers , 1975, IEEE Trans. Inf. Theory.

[8]  En-Hui Yang,et al.  Performance analysis of grammar-based codes revisited , 2004, IEEE Trans. Inf. Theory.

[9]  John C. Kieffer,et al.  Sample converses in source coding theory , 1991, IEEE Trans. Inf. Theory.

[10]  En-Hui Yang,et al.  Efficient universal lossless data compression algorithms based on a greedy sequential grammar transform - Part one: Without context models , 2000, IEEE Trans. Inf. Theory.

[11]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[12]  Fumio Kanaya,et al.  Asymptotical Optimality of Two Variations of Lempel-Ziv Codes for Sources with Countably Infinite Alphabet , 2006, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[13]  Dean P. Foster,et al.  Universal codes for finite sequences of integers drawn from a monotone distribution , 2002, IEEE Trans. Inf. Theory.

[14]  Pamela C. Cosman,et al.  Universal lossless compression via multilevel pattern matching , 2000, IEEE Trans. Inf. Theory.

[15]  Craig G. Nevill-Manning,et al.  Compression and Explanation Using Hierarchical Grammars , 1997, Comput. J..

[16]  Abhi Shelat,et al.  Approximating the smallest grammar: Kolmogorov complexity in natural models , 2002, STOC '02.

[17]  E Yang CHAITIN COMPLEXITY,SHANNON INFORMATION CONTENT OF A SINGLE EVENT,AND INFINITE RANDOM SEQUENCES(II) , 1991 .

[18]  En-Hui Yang,et al.  Grammar-based codes: A new class of universal lossless source codes , 2000, IEEE Trans. Inf. Theory.

[19]  Shen Shi-yi,et al.  CHAITIN COMPLEXITY,SHANNON INFORMATION CONTENT OF A SINGLE EVENT AND INFINITE RANDOM SEQUENCES(I) , 1991 .

[20]  Yunwei Jia,et al.  Universal lossless coding of sources with large and unbounded alphabets , 2000, 2000 IEEE International Symposium on Information Theory (Cat. No.00CH37060).

[21]  Lee D. Davisson,et al.  Universal noiseless coding , 1973, IEEE Trans. Inf. Theory.

[22]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[23]  John C. Kieffer,et al.  Structured grammar-based codes for universal lossless data compression , 2002, Commun. Inf. Syst..