Strong Data Processing Inequalities for Input Constrained Additive Noise Channels

This paper quantifies the intuitive observation that adding noise reduces available information by means of nonlinear strong data processing inequalities. Consider the random variables <inline-formula> <tex-math notation="LaTeX">$W\to X\to Y$ </tex-math></inline-formula> forming a Markov chain, where <inline-formula> <tex-math notation="LaTeX">$Y = X + Z$ </tex-math></inline-formula> with <inline-formula> <tex-math notation="LaTeX">$X$ </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">$Z$ </tex-math></inline-formula> real valued, independent and <inline-formula> <tex-math notation="LaTeX">$X$ </tex-math></inline-formula> bounded in <inline-formula> <tex-math notation="LaTeX">$L_{p}$ </tex-math></inline-formula>-norm. It is shown that <inline-formula> <tex-math notation="LaTeX">$I(W; Y) \le F_{I}(I(W;X))$ </tex-math></inline-formula> with <inline-formula> <tex-math notation="LaTeX">$F_{I}(t) < t$ </tex-math></inline-formula> whenever <inline-formula> <tex-math notation="LaTeX">$t > 0$ </tex-math></inline-formula>, if and only if <inline-formula> <tex-math notation="LaTeX">$Z$ </tex-math></inline-formula> has a density whose support is not disjoint from any translate of itself. A related question is to characterize for what couplings <inline-formula> <tex-math notation="LaTeX">$(W, X)$ </tex-math></inline-formula> the mutual information <inline-formula> <tex-math notation="LaTeX">$I(W; Y)$ </tex-math></inline-formula> is close to maximum possible. To that end we show that in order to saturate the channel, i.e., for <inline-formula> <tex-math notation="LaTeX">$I(W; Y)$ </tex-math></inline-formula> to approach capacity, it is mandatory that <inline-formula> <tex-math notation="LaTeX">$I(W; X)\to \infty $ </tex-math></inline-formula> (under suitable conditions on the channel). A key ingredient for this result is a deconvolution lemma which shows that postconvolution total variation distance bounds the preconvolution Kolmogorov–Smirnov distance. Explicit bounds are provided for the special case of the additive Gaussian noise channel with quadratic cost constraint. These bounds are shown to be order optimal. For this case, simplified proofs are provided leveraging Gaussian-specific tools such as the connection between information and estimation (I-MMSE) and Talagrand’s information-transportation inequality.

[1]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[2]  R. Dobrushin Central Limit Theorem for Nonstationary Markov Chains. II , 1956 .

[3]  P. A. P. Moran,et al.  An introduction to probability theory , 1968 .

[4]  R. Gallager Information Theory and Reliable Communication , 1968 .

[5]  D. A. Bell,et al.  Information Theory and Reliable Communication , 1969 .

[6]  Aaron D. Wyner,et al.  A theorem on the entropy of certain binary sequences and applications-I , 1973, IEEE Trans. Inf. Theory.

[7]  Aaron D. Wyner,et al.  A theorem on the entropy of certain binary sequences and applications-II , 1973, IEEE Trans. Inf. Theory.

[8]  Hans S. Witsenhausen,et al.  Entropy inequalities for discrete channels , 1974, IEEE Trans. Inf. Theory.

[9]  J. Kemperman On the Shannon capacity of an arbitrary channel , 1974 .

[10]  Hans S. Witsenhausen,et al.  A conditional entropy bound for a pair of discrete random variables , 1975, IEEE Trans. Inf. Theory.

[11]  P. Gács,et al.  Spreading of Sets in Product Spaces and Hypercontraction of the Markov Operator , 1976 .

[12]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[13]  Joel E. Cohen,et al.  Relative entropy under mappings by stochastic matrices , 1993 .

[14]  V. V. Petrov Limit Theorems of Probability Theory: Sequences of Independent Random Variables , 1995 .

[15]  M. Talagrand Transportation cost for Gaussian and other product measures , 1996 .

[16]  O. Kallenberg Foundations of Modern Probability , 2021, Probability Theory and Stochastic Modelling.

[17]  Charles M. Grinstead,et al.  Introduction to probability , 1999, Statistics for the Behavioural Sciences.

[18]  Thomas M. Cover,et al.  Network Information Theory , 2001 .

[19]  Shlomo Shamai,et al.  Mutual information and minimum mean-square error in Gaussian channels , 2004, IEEE Transactions on Information Theory.

[20]  Thomas M. Cover,et al.  Elements of information theory (2. ed.) , 2006 .

[21]  L. Dworsky An Introduction to Probability , 2008 .

[22]  Tsachy Weissman,et al.  The Information Lost in Erasures , 2008, IEEE Transactions on Information Theory.

[23]  S. Verdú,et al.  The impact of constellation cardinality on Gaussian channel capacity , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[24]  Imre Csiszár,et al.  Information Theory - Coding Theorems for Discrete Memoryless Systems, Second Edition , 2011 .

[25]  Igor Vajda,et al.  On Pairs of $f$ -Divergences and Their Joint Range , 2010, IEEE Transactions on Information Theory.

[26]  Shlomo Shamai,et al.  Estimation in Gaussian Noise: Properties of the Minimum Mean-Square Error , 2010, IEEE Transactions on Information Theory.

[27]  Venkat Anantharam,et al.  On Maximal Correlation, Hypercontractivity, and the Data Processing Inequality studied by Erkip and Cover , 2013, ArXiv.

[28]  Igal Sason,et al.  Concentration of Measure Inequalities in Information Theory, Communications, and Coding , 2012, Found. Trends Commun. Inf. Theory.

[29]  Maxim Raginsky,et al.  Strong Data Processing Inequalities and $\Phi $ -Sobolev Inequalities for Discrete Channels , 2014, IEEE Transactions on Information Theory.

[30]  Yihong Wu,et al.  Dissipation of Information in Channels With Input Constraints , 2014, IEEE Transactions on Information Theory.