Projection Theorems for the Rényi Divergence on $\alpha $ -Convex Sets

This paper studies forward and reverse projections for the Rényi divergence of order α E (0, ∞) on α-convex sets. The forward projection on such a set is motivated by some works of Tsallis et al. in statistical physics, and the reverse projection is motivated by robust statistics. In a recent work, van Erven and Harremoës proved a Pythagorean inequality for Rényi divergences on α-convex sets under the assumption that the forward projection exists. Continuing this study, a sufficient condition for the existence of a forward projection is proved for probability measures on a general alphabet. For α ϵ (1, ∞), the proof relies on a new Apollonius theorem for the Hellinger divergence, and for α E (0, 1), the proof relies on the Banach- Alaoglu theorem from the functional analysis. Further projection results are then obtained in the finite alphabet setting. These include a projection theorem on a specific α-convex set, which is termed an α-linear family, generalizing a result by Csiszar to α ≠ 1. The solution to this problem yields a parametric family of probability measures, which turns out to be an extension of the exponential family, and it is termed an α-exponential family. An orthogonality relationship between the α-exponential and α-linear families is established, and it is used to turn the reverse projection on an α-exponential family into a forward projection on an α-linear family. This paper also proves a convergence result of an iterative procedure used to calculate the forward projection on an intersection of a finite number of α-linear families.

[1]  Rajesh Sundaresan,et al.  Minimization Problems Based on Relative $\alpha $ -Entropy I: Forward Projection , 2014, IEEE Transactions on Information Theory.

[2]  Timothy R. C. Read,et al.  Multinomial goodness-of-fit tests , 1984 .

[3]  Elwyn R. Berlekamp,et al.  Lower Bounds to Error Probability for Coding on Discrete Memoryless Channels. II , 1967, Inf. Control..

[4]  Imre Csiszár,et al.  Information projections revisited , 2000, IEEE Trans. Inf. Theory.

[5]  Imre Csiszár Generalized cutoff rates and Renyi's information measures , 1995, IEEE Trans. Inf. Theory.

[6]  Rajesh Sundaresan,et al.  A measure of discrimination and its geometric properties , 2002, Proceedings IEEE International Symposium on Information Theory,.

[7]  Maxim Raginsky,et al.  Logarithmic Sobolev inequalities and strong data processing theorems for discrete channels , 2013, 2013 IEEE International Symposium on Information Theory.

[8]  G. Crooks On Measures of Entropy and Information , 2015 .

[9]  Kellen Petersen August Real Analysis , 2009 .

[10]  Igal Sason On the Rényi Divergence, Joint Range of Relative Entropies, and a Channel Coding Theorem , 2016, IEEE Transactions on Information Theory.

[11]  A. Barron Limits of information, Markov chains, and projection , 2000, 2000 IEEE International Symposium on Information Theory (Cat. No.00CH37060).

[12]  S. Kullback,et al.  A lower bound for discrimination information in terms of variation (Corresp.) , 1967, IEEE Trans. Inf. Theory.

[13]  Imre Csiszár,et al.  Information Theory and Statistics: A Tutorial , 2004, Found. Trends Commun. Inf. Theory.

[14]  Flemming Topsøe,et al.  Information-theoretical optimization techniques , 1979, Kybernetika.

[15]  I. Csiszár Generalized Cutoff Rates and Renyi's Information Measures , 1993, Proceedings. IEEE International Symposium on Information Theory.

[16]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[17]  Karl Pearson F.R.S. X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling , 2009 .

[18]  Amiel Feinstein,et al.  Information and information stability of random variables and processes , 1964 .

[19]  Sergio Verdú,et al.  Bounds among $f$-divergences , 2015, ArXiv.

[20]  Sergio Verdú,et al.  $f$ -Divergence Inequalities , 2015, IEEE Transactions on Information Theory.

[21]  Peter Harremoës,et al.  Rényi Divergence and Kullback-Leibler Divergence , 2012, IEEE Transactions on Information Theory.

[22]  L. Pardo Statistical Inference Based on Divergence Measures , 2005 .

[23]  Ofer Shayevitz,et al.  On Rényi measures and hypothesis testing , 2011, 2011 IEEE International Symposium on Information Theory Proceedings.

[24]  I. Csiszár Sanov Property, Generalized $I$-Projection and a Conditional Limit Theorem , 1984 .

[25]  R. Beran Minimum Hellinger distance estimates for parametric models , 1977 .

[26]  N. Sloane,et al.  Lower Bounds to Error Probability for Coding on Discrete Memoryless Channels. I , 1993 .

[27]  J. Kemperman,et al.  On the Optimum Rate of Transmitting Information , 1969 .

[28]  C. Tsallis,et al.  The role of constraints within generalized nonextensive statistics , 1998 .

[29]  D. A. Bell,et al.  Information Theory and Reliable Communication , 1969 .

[30]  I. Vajda,et al.  Convex Statistical Distances , 2018, Statistical Inference for Engineers and Data Scientists.

[31]  Rajesh Sundaresan,et al.  Guessing Under Source Uncertainty , 2006, IEEE Transactions on Information Theory.

[32]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[33]  K. Pearson On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling , 1900 .

[34]  T. Morimoto Markov Processes and the H -Theorem , 1963 .

[35]  A. Basu,et al.  Statistical Inference: The Minimum Distance Approach , 2011 .

[36]  Solomon Kullback,et al.  Correction to A Lower Bound for Discrimination Information in Terms of Variation , 1970, IEEE Trans. Inf. Theory.

[37]  Rajesh Sundaresan,et al.  Minimization Problems Based on Relative $\alpha $ -Entropy II: Reverse Projection , 2015, IEEE Transactions on Information Theory.

[38]  I. Csiszár $I$-Divergence Geometry of Probability Distributions and Minimization Problems , 1975 .

[39]  Thomas P. Minka,et al.  Divergence measures and message passing , 2005 .

[40]  N. N. Chent︠s︡ov Statistical decision rules and optimal inference , 1982 .

[41]  Peter Harremoës,et al.  Rényi divergence and majorization , 2010, 2010 IEEE International Symposium on Information Theory.

[42]  C. Tsallis Possible generalization of Boltzmann-Gibbs statistics , 1988 .

[43]  Thomas M. Cover,et al.  Elements of information theory (2. ed.) , 2006 .