Minimax Compression and Large Alphabet Approximation Through Poissonization and Tilting

This paper introduces a convenient strategy for coding and predicting sequences of independent, identically distributed random variables generated from a large alphabet of size $m$ . In particular, the size of the sample is allowed to be variable. The employment of a Poisson model and tilting method simplifies the implementation and analysis through independence. The resulting strategy is optimal within the class of distributions satisfying a moment condition, and it is close to optimal for the class of all i.i.d distributions on strings of a given length. The method also can be used to code and predict strings with a condition on the tail of the ordered counts, and it can be applied to distributions in an envelope class. Moreover, we show that our model permits exact computation of the minimax optimal code, for all alphabet sizes, when conditioning on the size of the sample.

[1]  Aurélien Garivier,et al.  Coding on Countably Infinite Alphabets , 2008, IEEE Transactions on Information Theory.

[2]  Daniel J. Velleman American Mathematical Monthly , 2010 .

[3]  Kazuho Watanabe,et al.  Bayesian properties of normalized maximum likelihood and its fast computation , 2014, 2014 IEEE International Symposium on Information Theory.

[4]  Y. Shtarkov,et al.  Multialphabet universal coding of memoryless sources , 1995 .

[5]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[6]  Andrew R. Barron,et al.  Minimax redundancy for the class of memoryless sources , 1997, IEEE Trans. Inf. Theory.

[7]  H. Robbins A Remark on Stirling’s Formula , 1955 .

[8]  Evgueni A. Haroutunian,et al.  Information Theory and Statistics , 2011, International Encyclopedia of Statistical Science.

[9]  Raphail E. Krichevsky,et al.  The performance of universal encoding , 1981, IEEE Trans. Inf. Theory.

[10]  Andrew R. Barron,et al.  Compression and predictive distributions for large alphabet i.i.d and Markov models , 2014, 2014 IEEE International Symposium on Information Theory.

[11]  I. Csiszár Sanov Property, Generalized $I$-Projection and a Conditional Limit Theorem , 1984 .

[12]  I. Csiszár $I$-Divergence Geometry of Probability Distributions and Minimization Problems , 1975 .

[13]  Richard Clark Pasco,et al.  Source coding algorithms for fast data compression , 1976 .

[14]  Dominique Bontemps Universal Coding on Infinite Alphabets: Exponentially Decreasing Envelopes , 2011, IEEE Transactions on Information Theory.

[15]  Andrew R. Barron,et al.  Asymptotic minimax regret for data compression, gambling, and prediction , 1997, IEEE Trans. Inf. Theory.

[16]  Alon Orlitsky,et al.  Always Good Turing: Asymptotically Optimal Probability Estimation , 2003, Science.

[17]  Glen G. Langdon,et al.  Arithmetic Coding , 1979 .

[18]  Petri Myllymäki,et al.  A linear-time algorithm for computing the multinomial stochastic complexity , 2007, Inf. Process. Lett..

[19]  Jan M. Van Campenhout,et al.  Maximum entropy and conditional probability , 1981, IEEE Trans. Inf. Theory.

[20]  Wojciech Szpankowski,et al.  Minimax redundancy for large alphabets , 2010, 2010 IEEE International Symposium on Information Theory.

[21]  J. Rissanen,et al.  ON SEQUENTIALLY NORMALIZED MAXIMUM LIKELIHOOD MODELS , 2008 .

[22]  Frederick Jelinek,et al.  Probabilistic Information Theory: Discrete and Memoryless Models , 1968 .

[23]  A. Barron,et al.  Jeffreys' prior is asymptotically least favorable under entropy risk , 1994 .

[24]  Lada A. Adamic Zipf, Power-laws, and Pareto-a ranking tutorial , 2000 .

[25]  Alon Orlitsky,et al.  Speaking of infinity [i.i.d. strings] , 2004, IEEE Transactions on Information Theory.