Tight bounds for universal compression of large alphabets

Over the past decade, several papers, e.g., [1-7] and references therein, have considered universal compression of sources over large alphabets, often using patterns to avoid infinite redundancy. Improving on previous results, we prove tight bounds on expected- and worst-case pattern redundancy, in particular closing a decade-long gap and showing that the worst-case pattern redundancy of i.i.d. distributions is Θ(n1/3)†.

[1]  Aaron B. Wagner,et al.  Near-lossless compression of large alphabet sources , 2012, 2012 46th Annual Conference on Information Sciences and Systems (CISS).

[2]  Michael B. Pursley,et al.  Efficient universal noiseless source codes , 1981, IEEE Trans. Inf. Theory.

[3]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[4]  Jorma Rissanen,et al.  Fisher information and stochastic complexity , 1996, IEEE Trans. Inf. Theory.

[5]  Michael Drmota,et al.  Precise minimax redundancy and regret , 2004, IEEE Transactions on Information Theory.

[6]  A. Orlitsky,et al.  Always Good Turing: Asymptotically Optimal Probability Estimation , 2003, Science.

[7]  T. Cover Universal Portfolios , 1996 .

[8]  W. Szpankowski ON ASYMPTOTICS OF CERTAIN RECURRENCES ARISING IN UNIVERSAL CODING , 1998 .

[9]  Wojciech Szpankowski,et al.  Minimax redundancy for large alphabets , 2010, 2010 IEEE International Symposium on Information Theory.

[10]  Neri Merhav,et al.  Universal Prediction , 1998, IEEE Trans. Inf. Theory.

[11]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[12]  Alon Orlitsky,et al.  On Modeling Profiles Instead of Values , 2004, UAI.

[13]  Sanjeev R. Kulkarni,et al.  A Better Good-Turing Estimator for Sequence Probabilities , 2007, 2007 IEEE International Symposium on Information Theory.

[14]  Aurélien Garivier A Lower-Bound for the Maximin Redundancy in Pattern Coding , 2009, Entropy.

[15]  Harald Niederreiter,et al.  Probability and computing: randomized algorithms and probabilistic analysis , 2006, Math. Comput..

[16]  Alon Orlitsky,et al.  Competitive Closeness Testing , 2011, COLT.

[17]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[18]  Alon Orlitsky,et al.  Tight Bounds on Profile Redundancy and Distinguishability , 2012, NIPS.

[19]  Alon Orlitsky,et al.  Universal compression of memoryless sources over unknown alphabets , 2004, IEEE Transactions on Information Theory.

[20]  B. Fitingof Coding in the case of unknown and changing message statistics , 1966 .

[21]  Gil I. Shamir Universal Lossless Compression With Unknown Alphabets - The Average Case , 2006, IEEE Trans. Inf. Theory.

[22]  Aurélien Garivier,et al.  Coding on Countably Infinite Alphabets , 2008, IEEE Transactions on Information Theory.

[23]  Tsachy Weissman,et al.  On the Entropy Rate of Pattern Processes , 2005, IEEE Transactions on Information Theory.

[24]  Peter Elias,et al.  Universal codeword sets and representations of the integers , 1975, IEEE Trans. Inf. Theory.

[25]  Lee D. Davisson,et al.  Universal noiseless coding , 1973, IEEE Trans. Inf. Theory.

[26]  Frans M. J. Willems,et al.  The context-tree weighting method: basic properties , 1995, IEEE Trans. Inf. Theory.

[27]  A. Barron,et al.  Asymptotic minimax regret for data compression, gambling and prediction , 1997, Proceedings of IEEE International Symposium on Information Theory.