论文信息 - Large Alphabet Compression and Predictive Distributions through Poissonization and Tilting

Large Alphabet Compression and Predictive Distributions through Poissonization and Tilting

This paper introduces a convenient strategy for coding and predicting sequences of independent, identically distributed random variables generated from a large alphabet of size $m$. In particular, the size of the sample is allowed to be variable. The employment of a Poisson model and tilting method simplifies the implementation and analysis through independence. The resulting strategy is optimal within the class of distributions satisfying a moment condition, and is close to optimal for the class of all i.i.d distributions on strings of a given length. Moreover, the method can be used to code and predict strings with a condition on the tail of the ordered counts. It can also be applied to distributions in an envelope class.

Andrew R. Barron | Xiao Yang | A. Barron | Xiao Yang

[1] Wojciech Szpankowski,et al. Minimax redundancy for large alphabets , 2010, 2010 IEEE International Symposium on Information Theory.

[2] Peter L. Bartlett,et al. Horizon-Independent Optimal Prediction with Log-Loss in Exponential Families , 2013, COLT.

[3] Dominique Bontemps. Universal Coding on Infinite Alphabets: Exponentially Decreasing Envelopes , 2011, IEEE Transactions on Information Theory.

[4] I. Csiszár. $I$-Divergence Geometry of Probability Distributions and Minimization Problems , 1975 .

[5] Andrew R. Barron,et al. Minimax redundancy for the class of memoryless sources , 1997, IEEE Trans. Inf. Theory.

[6] Evgueni A. Haroutunian,et al. Information Theory and Statistics , 2011, International Encyclopedia of Statistical Science.

[7] Alon Orlitsky,et al. Always Good Turing: Asymptotically Optimal Probability Estimation , 2003, Science.

[8] J. Rissanen,et al. ON SEQUENTIALLY NORMALIZED MAXIMUM LIKELIHOOD MODELS , 2008 .

[9] Jan M. Van Campenhout,et al. Maximum entropy and conditional probability , 1981, IEEE Trans. Inf. Theory.

[10] Lada A. Adamic. Zipf, Power-laws, and Pareto-a ranking tutorial , 2000 .

[11] Aurélien Garivier,et al. Coding on Countably Infinite Alphabets , 2008, IEEE Transactions on Information Theory.

[12] Andrew R. Barron,et al. Asymptotic minimax regret for data compression, gambling, and prediction , 1997, IEEE Trans. Inf. Theory.

[13] Y. Shtarkov,et al. Multialphabet universal coding of memoryless sources , 1995 .

[14] I. Csiszár. Sanov Property, Generalized $I$-Projection and a Conditional Limit Theorem , 1984 .