On Data-Processing and Majorization Inequalities for f-Divergences with Applications

This paper is focused on the derivation of data-processing and majorization inequalities for f-divergences, and their applications in information theory and statistics. For the accessibility of the material, the main results are first introduced without proofs, followed by exemplifications of the theorems with further related analytical results, interpretations, and information-theoretic applications. One application refers to the performance analysis of list decoding with either fixed or variable list sizes; some earlier bounds on the list decoding error probability are reproduced in a unified way, and new bounds are obtained and exemplified numerically. Another application is related to a study of the quality of approximating a probability mass function, induced by the leaves of a Tunstall tree, by an equiprobable distribution. The compression rates of finite-length Tunstall codes are further analyzed for asserting their closeness to the Shannon entropy of a memoryless and stationary discrete source. Almost all the analysis is relegated to the appendices, which form the major part of this manuscript.

[1]  Sergio Verdú,et al.  On the Interplay Between Conditional Entropy and Error Probability , 2010, IEEE Transactions on Information Theory.

[2]  I. Csiszár A class of measures of informativity of observation channels , 1972 .

[3]  Yihong Wu,et al.  Strong Data Processing Inequalities for Input Constrained Additive Noise Channels , 2015, IEEE Transactions on Information Theory.

[4]  Aditya Guntuboyina Lower Bounds for the Minimax Risk Using $f$-Divergences, and Applications , 2011, IEEE Transactions on Information Theory.

[5]  Avraham Adler,et al.  Lambert-W Function , 2015 .

[6]  Brian Parker Tunstall,et al.  Synthesis of noiseless compression codes , 1967 .

[7]  Leandro Pardo Llorente Statistical inference based on divergence measures , 2005 .

[8]  M. Raginsky Concentration of Measure Inequalities in Information Theory, Communications, and Coding: Second Edition , 2014 .

[9]  P. Gács,et al.  Bounds on conditional probabilities with applications in multi-user communication , 1976 .

[10]  G. Crooks On Measures of Entropy and Information , 2015 .

[11]  Jacob Ziv,et al.  On functionals satisfying a data-processing theorem , 1973, IEEE Trans. Inf. Theory.

[12]  Gaston H. Gonnet,et al.  On the LambertW function , 1996, Adv. Comput. Math..

[13]  I. Olkin,et al.  Inequalities: Theory of Majorization and Its Applications , 1980 .

[14]  Amir Beck,et al.  Introduction to Nonlinear Optimization - Theory, Algorithms, and Applications with MATLAB , 2014, MOS-SIAM Series on Optimization.

[15]  Igor Vajda,et al.  On asymptotic properties of information-theoretic divergences , 2003, IEEE Transactions on Information Theory.

[16]  Young-Han Kim,et al.  State Amplification , 2008, IEEE Transactions on Information Theory.

[17]  Lizhong Zheng,et al.  Bounds between contraction coefficients , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[18]  Igal Sason,et al.  On f-Divergences: Integral Representations, Local Behavior, and Inequalities , 2018, Entropy.

[19]  L. Pardo Statistical Inference Based on Divergence Measures , 2005 .

[20]  Gustavo L. Gilardoni On Pinsker's and Vajda's Type Inequalities for Csiszár's $f$ -Divergences , 2006, IEEE Transactions on Information Theory.

[21]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[22]  P. Gács,et al.  Spreading of Sets in Product Spaces and Hypercontraction of the Markov Operator , 1976 .

[23]  Igor Vajda,et al.  On Divergences and Informations in Statistics and Information Theory , 2006, IEEE Transactions on Information Theory.

[24]  Karl Pearson F.R.S. X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling , 2009 .

[25]  Igor Vajda,et al.  On divergences of finite measures and their applicability in statistics and information theory , 2010 .

[26]  J. Steele The Cauchy–Schwarz Master Class: References , 2004 .

[27]  Yihong Wu,et al.  Strong data-processing inequalities for channels and Bayesian networks , 2015, 1508.06025.

[28]  Martin J. Wainwright,et al.  Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization , 2008, IEEE Transactions on Information Theory.

[29]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[30]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[31]  Pierre Brémaud Discrete Probability Models and Methods , 2017 .

[32]  Ugo Vaccaro,et al.  Bounds on the Entropy of a Function of a Random Variable and Their Applications , 2017, IEEE Transactions on Information Theory.

[33]  Igor Vajda,et al.  About distances of discrete distributions satisfying the data processing theorem of information theory , 1997, IEEE Trans. Inf. Theory.

[34]  Lizhong Zheng,et al.  Linear Bounds between Contraction Coefficients for $f$-Divergences , 2015, 1510.01844.

[35]  Andrzej Cichocki,et al.  Families of Alpha- Beta- and Gamma- Divergences: Flexible and Robust Measures of Similarities , 2010, Entropy.

[36]  Ugo Vaccaro,et al.  Minimum-Entropy Couplings and Their Applications , 2019, IEEE Transactions on Information Theory.

[37]  Yihong Wu,et al.  Dissipation of Information in Channels With Input Constraints , 2014, IEEE Transactions on Information Theory.

[38]  Sergio Verdú,et al.  $f$ -Divergence Inequalities , 2015, IEEE Transactions on Information Theory.

[39]  Alison L Gibbs,et al.  On Choosing and Bounding Probability Metrics , 2002, math/0209021.

[40]  Second and third order moment inequalities for probability distributions , 2018, Acta Mathematica Hungarica.

[41]  Pierre Brémaud Discrete Probability Models and Methods: Probability on Graphs and Trees, Markov Chains and Random Fields, Entropy and Coding , 2017 .

[42]  Sergio Verdú,et al.  Arimoto–Rényi Conditional Entropy and Bayesian $M$ -Ary Hypothesis Testing , 2017, IEEE Transactions on Information Theory.

[43]  I. Vajda Theory of statistical inference and information , 1989 .

[44]  M. Degroot Uncertainty, Information, and Sequential Experiments , 1962 .

[45]  Igal Sason,et al.  Concentration of Measure Inequalities in Information Theory, Communications, and Coding , 2012, Found. Trends Commun. Inf. Theory.

[46]  A. Keziou Dual representation of Φ-divergences and applications , 2003 .

[47]  C. Tsallis Possible generalization of Boltzmann-Gibbs statistics , 1988 .

[48]  Sergio Verdú,et al.  $E_{ {\gamma }}$ -Resolvability , 2015, IEEE Transactions on Information Theory.

[49]  Luisa Gargano,et al.  A Note on Approximation of Uniform Distributions From Variable-to-Fixed Length Codes , 2006, IEEE Transactions on Information Theory.

[50]  Igal Sason,et al.  Tight Bounds on the Rényi Entropy via Majorization with Applications to Guessing and Compression , 2018, Entropy.

[51]  Neri Merhav,et al.  Data Processing Theorems and the Second Law of Thermodynamics , 2010, IEEE Transactions on Information Theory.

[52]  I. Vajda,et al.  Convex Statistical Distances , 2018, Statistical Inference for Engineers and Data Scientists.

[53]  Ingram Olkin,et al.  Inequalities: Theory of Majorization and Its Application , 1979 .

[54]  K. Pearson On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling , 1900 .

[55]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[56]  T. Morimoto Markov Processes and the H -Theorem , 1963 .

[57]  E. Beckenbach CONVEX FUNCTIONS , 2007 .

[58]  Rudolf Ahlswede,et al.  Source coding with side information and a converse for degraded broadcast channels , 1975, IEEE Trans. Inf. Theory.

[59]  M. Durea,et al.  An Introduction to Nonlinear Optimization Theory , 2014 .

[60]  J. Ziv,et al.  A Generalization of the Rate-Distortion Theory and Applications , 1975 .

[61]  Anuran Makur,et al.  Comparison of Channels: Criteria for Domination by a Symmetric Channel , 2016, IEEE Transactions on Information Theory.

[62]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[63]  Imre Csiszár,et al.  Information Theory - Coding Theorems for Discrete Memoryless Systems, Second Edition , 2011 .

[64]  D. Varberg Convex Functions , 1973 .

[65]  Peter Harremoës,et al.  Rényi Divergence and Kullback-Leibler Divergence , 2012, IEEE Transactions on Information Theory.

[66]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[67]  Joel E. Cohen,et al.  Relative entropy under mappings by stochastic matrices , 1993 .

[68]  Sergio Verdú,et al.  Convexity/concavity of renyi entropy and α-mutual information , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[69]  Xi Chen,et al.  On Bayes Risk Lower Bounds , 2014, J. Mach. Learn. Res..

[70]  Slavko Simic On Logarithmic Convexity for Differences of Power Means , 2007 .

[71]  Jean-Francois Collet,et al.  An Exact Expression for the Gap in the Data Processing Inequality for $f$ -Divergences , 2019, IEEE Transactions on Information Theory.

[72]  Maxim Raginsky,et al.  Strong Data Processing Inequalities and $\Phi $ -Sobolev Inequalities for Discrete Channels , 2014, IEEE Transactions on Information Theory.

[73]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[74]  Frederick Jelinek,et al.  On variable-length-to-block coding , 1972, IEEE Trans. Inf. Theory.

[75]  Josip Pečarić,et al.  A note on Jensen's inequality for 2D-convex functions , 2013 .