Bounds of compression of unknown alphabets

It is known that the redundancy of universally compressing i.i.d. strings increases to infinity as the alphabet size grows. It is also apparent that any string can be described by separately conveying its symbols, and their pattern-the order in which they appear. Concentrating on the latter, we show that the patterns of iid strings drawn from any, possibly infinite or even unknown, alphabet, can be universally compressed with diminishing worst-case redundancy, both in block, and sequentially.

[1]  Alon Orlitsky,et al.  Universal compression of memoryless sources over unknown alphabets , 2004, IEEE Transactions on Information Theory.

[2]  Ben J. M. Smeets,et al.  Multialphabet coding with separate alphabet description , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[3]  G. Hardy,et al.  Asymptotic Formulaæ in Combinatory Analysis , 1918 .

[4]  John C. Kieffer,et al.  A unified approach to weak universal source coding , 1978, IEEE Trans. Inf. Theory.

[5]  Alon Orlitsky,et al.  Performance of universal codes over infinite alphabets , 2003, Data Compression Conference, 2003. Proceedings. DCC 2003.

[6]  A. Orlitsky,et al.  Universal compression of unknown alphabets , 2002, Proceedings IEEE International Symposium on Information Theory,.