LARGE ALPHABET CODING AND PREDICTION THROUGH POISSONIZATION AND TILTING

This paper introduces a convenient strategy for compression and prediction of sequences of independent, identically distributed random variables generated from a large alphabet of size m. In particular, the size of the sample is allowed to be variable. The employment of a Poisson model and tilting method simplifies the implementation and analysis through independence. The resulting strategy is optimal within the class of distributions satisfying a moment condition, and is close to optimal for a smaller class – the class of distributions with an analogous condition on the counts. Moreover, the method can be used to code and predict sequences in a subset with the tail counts satisfying a given condition, and it can also be applied to envelope classes.