论文信息 - Multialphabet coding with separate alphabet description

Multialphabet coding with separate alphabet description

For lossless universal source coding of memoryless sequences with an a priori unknown alphabet size (multialphabet coding), the alphabet of the sequence must be described as well as the sequence itself. Usually an efficient description of the alphabet can be made only by taking into account some additional information. We show that these descriptions can be separated in such a way that the encoding of the actual sequence can be performed independently of the alphabet description, and present sequential coding methods for such sequences. Such methods have applications in coding methods where the alphabet description is made available sequentially, such as PPM.

Ben J. M. Smeets | Yuri M. Shtarkov | Jan Åberg

[1] Y. Shtarkov,et al. Multialphabet universal coding of memoryless sources , 1995 .

[2] Ben J. M. Smeets,et al. Towards understanding and improving escape probabilities in PPM , 1997, Proceedings DCC '97. Data Compression Conference.

[3] Ian H. Witten,et al. Data Compression Using Adaptive Coding and Partial String Matching , 1984, IEEE Trans. Commun..

[4] Ian H. Witten,et al. Modeling for text compression , 1989, CSUR.

[5] Ian H. Witten,et al. The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression , 1991, IEEE Trans. Inf. Theory.

[6] Alistair Moffat,et al. Implementing the PPM data compression scheme , 1990, IEEE Trans. Commun..