About Adaptive Coding on Countable Alphabets: Max-Stable Envelope Classes

In this paper, we study the problem of lossless universal source coding for stationary memoryless sources on countably infinite alphabets. This task is generally not achievable without restricting the class of sources over which universality is desired. Building on our prior work, we propose natural families of sources characterized by a common dominating envelope. We particularly emphasize the notion of adaptivity, which is the ability to perform as well as an oracle knowing the envelope, without actually knowing it. This is closely related to the notion of hierarchical universal source coding, but with the important difference that families of envelope classes are not discretely indexed and not necessarily nested. Our contribution is to extend the classes of envelopes over which adaptive universal source coding is possible, namely by including max-stable (heavy-tailed) envelopes which are excellent models in many applications, such as natural language modeling. We derive a minimax lower bound on the redundancy of any code on such envelope classes, including an oracle that knows the envelope. We then propose a constructive code that does not use knowledge of the envelope. The code is computationally efficient and is structured to use an expanding threshold for auto-censoring (ETAC), and we therefore dub it the ETAC-code. We prove that the ETAC-code achieves the lower bound on the minimax redundancy within a factor logarithmic in the sequence length, and can be therefore qualified as a near-adaptive code over families of heavy-tailed envelopes. For finite and light-tailed envelopes, the penalty is even less, and the same code follows closely previous results that explicitly made the light-tailed assumption. Our technical results are founded on methods from regular variation theory and concentration of measure.

[1]  S. Boucheron,et al.  Concentration inequalities for order statistics , 2012, 1207.7209.

[2]  P. Bickel Efficient and Adaptive Estimation for Semiparametric Models , 1993 .

[3]  Glen G. Langdon,et al.  Arithmetic Coding , 1979 .

[4]  Gaps in Discrete Random Samples , 2009, Journal of Applied Probability.

[5]  Jorma Rissanen,et al.  The Minimum Description Length Principle in Coding and Modeling , 1998, IEEE Trans. Inf. Theory.

[6]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[7]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[8]  D. Haussler,et al.  MUTUAL INFORMATION, METRIC ENTROPY AND CUMULATIVE RELATIVE ENTROPY RISK , 1997 .

[9]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[10]  S. Resnick Extreme Values, Regular Variation, and Point Processes , 1987 .

[11]  E. J. Gumbel,et al.  Statistics of Extremes. , 1960 .

[12]  Eugene Seneta,et al.  Slowly varying functions and asymptotic relations , 1971 .

[13]  Alon Orlitsky,et al.  Speaking of infinity [i.i.d. strings] , 2004, IEEE Transactions on Information Theory.

[14]  A. Barron,et al.  Jeffreys' prior is asymptotically least favorable under entropy risk , 1994 .

[15]  Meir Feder,et al.  Bounded Expected Delay in Arithmetic Coding , 2006, 2006 IEEE International Symposium on Information Theory.

[16]  Neri Merhav,et al.  Universal Prediction , 1998, IEEE Trans. Inf. Theory.

[17]  S. Karlin Central Limit Theorems for Certain Infinite Urn Schemes , 1967 .

[18]  Philippe Jacquet,et al.  Asymptotic Behavior of the Lempel-Ziv Parsing Scheme and Digital Search Trees , 1995, Theor. Comput. Sci..

[19]  Philippe Jacquet,et al.  Average Profile of the Lempel-Ziv Parsing Scheme for a Markovian Source , 2001, Algorithmica.

[20]  J. Pitman,et al.  Notes on the occupancy problem with infinitely many boxes: general asymptotics and power laws ∗ , 2007, math/0701718.

[21]  J. Hüsler,et al.  Laws of Small Numbers: Extremes and Rare Events , 1994 .

[22]  Gassiat Élisabeth,et al.  Codage universel et identification d'ordre par sélection de modèles , 2014 .

[23]  Andrew R. Barron,et al.  Information-theoretic asymptotics of Bayes methods , 1990, IEEE Trans. Inf. Theory.

[24]  Andrew R. Barron,et al.  Asymptotic minimax regret for data compression, gambling, and prediction , 1997, IEEE Trans. Inf. Theory.

[25]  O. Lepskii Asymptotically Minimax Adaptive Estimation. I: Upper Bounds. Optimally Adaptive Estimates , 1992 .

[26]  Vahid Tarokh,et al.  Existence of optimal prefix codes for infinite source alphabets , 1997, IEEE Trans. Inf. Theory.

[27]  L. Haan,et al.  Extreme value theory , 2006 .

[28]  P. Massart,et al.  Risk bounds for model selection via penalization , 1999 .

[29]  W. F. Trench,et al.  Introduction to Real Analysis: An Educational Approach , 2009 .

[30]  László Györfi,et al.  On Universal Noiseless Source Coding for Infinite Source Alphabets , 1993, Eur. Trans. Telecommun..

[31]  Olga Korosteleva,et al.  Mathematical Statistics: Asymptotic Minimax Theory , 2011 .

[32]  Peter Elias,et al.  Universal codeword sets and representations of the integers , 1975, IEEE Trans. Inf. Theory.

[33]  Olivier Catoni,et al.  Statistical learning theory and stochastic optimization , 2004 .

[34]  Aurélien Garivier,et al.  Coding on Countably Infinite Alphabets , 2008, IEEE Transactions on Information Theory.

[35]  Wojciech Szpankowski,et al.  Average Case Analysis of Algorithms on Sequences: Szpankowski/Average , 2001 .

[36]  W. Szpankowski Average Case Analysis of Algorithms on Sequences , 2001 .

[37]  Andrew R. Barron,et al.  Minimax redundancy for the class of memoryless sources , 1997, IEEE Trans. Inf. Theory.

[38]  Arnold Knopfmacher,et al.  The number of distinct values in a geometrically distributed sample , 2006, Eur. J. Comb..

[39]  M. Meerschaert Regular Variation in R k , 1988 .

[40]  A. Tsybakov,et al.  Introduction à l'estimation non-paramétrique , 2003 .

[41]  Alon Orlitsky,et al.  Poissonization and universal compression of envelope classes , 2014, 2014 IEEE International Symposium on Information Theory.

[42]  Dean P. Foster,et al.  Universal codes for finite sequences of integers drawn from a monotone distribution , 2002, IEEE Trans. Inf. Theory.

[43]  Munther A. Dahleh,et al.  Rare Probability Estimation under Regularly Varying Heavy Tails , 2012, COLT.

[44]  Aurélien Garivier A Lower-Bound for the Maximin Redundancy in Pattern Coding , 2009, Entropy.

[45]  Munther A. Dahleh,et al.  Large alphabets: Finite, infinite, and scaling models , 2012, 2012 46th Annual Conference on Information Sciences and Systems (CISS).

[46]  Stéphane Boucheron,et al.  About Adaptive Coding on Countable Alphabets , 2012, IEEE Transactions on Information Theory.

[47]  Boris Ryabko Twice-universal coding , 2015 .

[48]  L. Haan,et al.  Extreme value theory : an introduction , 2006 .

[49]  Dominique Bontemps Universal Coding on Infinite Alphabets: Exponentially Decreasing Envelopes , 2011, IEEE Transactions on Information Theory.

[50]  Wojciech Szpankowski,et al.  Minimax Pointwise Redundancy for Memoryless Models Over Large Alphabets , 2012, IEEE Transactions on Information Theory.

[51]  Alon Orlitsky,et al.  Universal Compression of Envelope Classes: Tight Characterization via Poisson Sampling , 2014, ArXiv.

[52]  Arlene K. H. Kim,et al.  Adaptive and minimax optimal estimation of the tail coefficient , 2013, 1309.2585.

[53]  A. Barron,et al.  LARGE ALPHABET CODING AND PREDICTION THROUGH POISSONIZATION AND TILTING , 2013 .

[54]  S. Hubbert Extreme Value Theory , 2019, Handbook of Heavy-Tailed Distributions in Asset Management and Risk Management.

[55]  Nicole A. Lazar,et al.  Statistics of Extremes: Theory and Applications , 2005, Technometrics.

[56]  Jaakko Astola,et al.  Adaptive Coding and Prediction of Sources With Large and Infinite Alphabets , 2004, IEEE Transactions on Information Theory.

[57]  László Györfi,et al.  There is no universal source code for an infinite source alphabet , 1994, IEEE Trans. Inf. Theory.

[58]  Jorma Rissanen,et al.  Universal coding, information, prediction, and estimation , 1984, IEEE Trans. Inf. Theory.

[59]  Frans M. J. Willems,et al.  The Context-Tree Weighting Method : Extensions , 1998, IEEE Trans. Inf. Theory.

[60]  Guy Louchard,et al.  Average redundancy rate of the Lempel-Ziv code , 1996, Proceedings of Data Compression Conference - DCC '96.

[61]  Thomas H. Cormen,et al.  Introduction to algorithms [2nd ed.] , 2001 .

[62]  J. Teugels,et al.  Statistics of Extremes , 2004 .

[63]  John C. Kieffer,et al.  A unified approach to weak universal source coding , 1978, IEEE Trans. Inf. Theory.

[64]  Raphail E. Krichevsky,et al.  The performance of universal encoding , 1981, IEEE Trans. Inf. Theory.

[65]  Serap A. Savari,et al.  Redundancy of the Lempel-Ziv incremental parsing rule , 1997, IEEE Trans. Inf. Theory.

[66]  Mesrob I. Ohannessian,et al.  Concentration inequalities in the infinite urn scheme for occupancy counts and the missing mass, with applications , 2014, 1412.8652.

[67]  K. Do,et al.  Efficient and Adaptive Estimation for Semiparametric Models. , 1994 .

[68]  S. Boucheron,et al.  Tail index estimation, concentration and adaptivity , 2015, 1503.05077.

[69]  Andrew R. Barron,et al.  Large Alphabet Compression and Predictive Distributions through Poissonization and Tilting , 2014, ArXiv.