Rényi Divergence and Kullback-Leibler Divergence

Rényi divergence is related to Rényi entropy much like Kullback-Leibler divergence is related to Shannon's entropy, and comes up in many settings. It was introduced by Rényi as a measure of information that satisfies almost the same axioms as Kullback-Leibler divergence, and depends on a parameter that is called its order. In particular, the Rényi divergence of order 1 equals the Kullback-Leibler divergence. We review and extend the most important properties of Rényi divergence and Kullback- Leibler divergence, including convexity, continuity, limits of σ-algebras, and the relation of the special order 0 to the Gaussian dichotomy and contiguity. We also show how to generalize the Pythagorean inequality to orders different from 1, and we extend the known equivalence between channel capacity and minimax redundancy to continuous channel inputs (for all orders) and present several other minimax results.

[1]  Rajesh Sundaresan,et al.  A measure of discrimination and its geometric properties , 2002, Proceedings IEEE International Symposium on Information Theory,.

[2]  Peter Harremoës,et al.  Rényi divergence and majorization , 2010, 2010 IEEE International Symposium on Information Theory.

[3]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[4]  Edward C. Posner,et al.  Random coding strategies for minimum entropy , 1975, IEEE Trans. Inf. Theory.

[5]  Igor Vajda,et al.  On Divergences and Informations in Statistics and Information Theory , 2006, IEEE Transactions on Information Theory.

[6]  J. Aczel,et al.  On Measures of Information and Their Characterizations , 2012 .

[7]  Imre Csiszár Generalized cutoff rates and Renyi's information measures , 1995, IEEE Trans. Inf. Theory.

[8]  Boris Ryabko,et al.  Comments on 'A source matching approach to finding minimax codes' by L. D. Davisson and A. Leon-Garcia , 1981, IEEE Trans. Inf. Theory.

[9]  R. Taylor A User's Guide to Measure-Theoretic Probability , 2003 .

[10]  Rajesh Sundaresan Guessing Under Source Uncertainty With Side Information , 2006, 2006 IEEE International Symposium on Information Theory.

[11]  S. Geer Hellinger-Consistency of Certain Nonparametric Maximum Likelihood Estimators , 1993 .

[12]  Fady Alajaji,et al.  Rényi divergence measures for commonly used univariate continuous distributions , 2013, Inf. Sci..

[13]  Peter Harremoes Interpretations of Renyi Entropies And Divergences , 2005 .

[14]  H. Komiya Elementary proof for Sion's minimax theorem , 1988 .

[15]  P. Diaconis,et al.  Strong uniform times and finite random walks , 1987 .

[16]  L. Birge,et al.  On estimating a density using Hellinger distance and some other strange facts , 1986 .

[17]  Klaus K. Holst,et al.  Convergence of Markov Chains in Information Divergence , 2009 .

[18]  I. Csiszár $I$-Divergence Geometry of Probability Distributions and Minimization Problems , 1975 .

[19]  Ofer Shayevitz,et al.  On Rényi measures and hypothesis testing , 2011, 2011 IEEE International Symposium on Information Theory Proceedings.

[20]  O. Kallenberg Foundations of Modern Probability , 2021, Probability Theory and Stochastic Modelling.

[21]  M. Gil ON RÉNYI DIVERGENCE MEASURES FOR CONTINUOUS ALPHABET SOURCES , 2011 .

[22]  Brian J. Thelen,et al.  Fisher Information and Dichotomies in Equivalence/Contiguity , 1989 .

[23]  Alison L Gibbs,et al.  On Choosing and Bounding Probability Metrics , 2002, math/0209021.

[24]  Leandro Pardo Llorente,et al.  Renyi statistics in directed families of exponential experiments , 2000 .

[25]  D. Haussler,et al.  MUTUAL INFORMATION, METRIC ENTROPY AND CUMULATIVE RELATIVE ENTROPY RISK , 1997 .

[26]  Ofer Shayevitz,et al.  A Note on a Characterization of Rényi Measures and its Relation to Composite Hypothesis Testing , 2010, ArXiv.

[27]  M. Sion On general minimax theorems , 1958 .

[28]  Evgueni A. Haroutunian,et al.  Information Theory and Statistics , 2011, International Encyclopedia of Statistical Science.

[29]  Peter Harremoës,et al.  Refinements of Pinsker's inequality , 2003, IEEE Trans. Inf. Theory.

[30]  A. V. D. Vaart,et al.  Asymptotic Statistics: Frontmatter , 1998 .

[31]  Imre Csiszár,et al.  Information projections revisited , 2000, IEEE Trans. Inf. Theory.

[32]  L. Lecam Convergence of Estimates Under Dimensionality Restrictions , 1973 .

[33]  Gustavo L. Gilardoni On Pinsker's and Vajda's Type Inequalities for Csiszár's $f$ -Divergences , 2006, IEEE Transactions on Information Theory.

[34]  A. Barron Limits of information, Markov chains, and projection , 2000, 2000 IEEE International Symposium on Information Theory (Cat. No.00CH37060).

[35]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[36]  Yu. V. Prokhorov Convergence of Random Processes and Limit Theorems in Probability Theory , 1956 .

[37]  Yishay Mansour,et al.  Multiple Source Adaptation and the Rényi Divergence , 2009, UAI.

[38]  A. Hero,et al.  Alpha-Divergence for Classification, Indexing and Retrieval (Revised 2) , 2002 .

[39]  G. Crooks On Measures of Entropy and Information , 2015 .

[40]  Amiel Feinstein,et al.  Information and information stability of random variables and processes , 1964 .

[41]  Jorma Rissanen,et al.  Minimum Description Length Principle , 2010, Encyclopedia of Machine Learning.

[42]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[43]  S. Kakutani On Equivalence of Infinite Product Measures , 1948 .

[44]  Flemming Topsøe,et al.  Information Theory at the Service of Science , 2007 .

[45]  Vasant Shankar Huzurbazar,et al.  Exact forms of some invariants for distributions admitting sufficient statistics , 1955 .

[46]  Alfréd Rényi,et al.  On some basic problems of statistics from the point of view of information theory , 1967 .

[47]  Fady Alajaji,et al.  Rényi's divergence and entropy rates for finite alphabet Markov sources , 2001, IEEE Trans. Inf. Theory.

[48]  R. Sibson Information radius , 1969 .

[49]  I. Vajda,et al.  Convex Statistical Distances , 2018, Statistical Inference for Engineers and Data Scientists.

[50]  David Haussler,et al.  A general minimax result for relative entropy , 1997, IEEE Trans. Inf. Theory.

[51]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[52]  Tong Zhang From ɛ-entropy to KL-entropy: Analysis of minimum information complexity density estimation , 2006, math/0702653.

[53]  M. Ben-Bassat,et al.  Renyi's entropy and the probability of error , 1978, IEEE Trans. Inf. Theory.

[54]  Jacob Feldman,et al.  Equivalence and perpendicularity of Gaussian processes , 1958 .

[55]  P. Harremoës Interpretations of Rényi entropies and divergences , 2006 .

[56]  Manuel Gil,et al.  On Rényi Divergence Measures for Continuous Alphabet Sources , 2011 .

[57]  Van Erven,et al.  When Data Compression and Statistics Disagree: Two Frequentist Challenges for the Minimum Description Length Principle , 2010 .

[58]  J. Naudts Estimators, escort probabilities, and phi-exponential families in statistical physics , 2004, math-ph/0402005.