The Interplay Between Entropy and Variational Distance

The relation between the Shannon entropy and variational distance, two fundamental and frequently-used quantities in information theory, is studied in this paper by means of certain bounds on the entropy difference between two probability distributions in terms of the variational distance between them and their alphabet sizes. We also show how to find the distribution achieving the minimum (or maximum) entropy among those distributions within a given variational distance from any given distribution. These results are applied to solve a number of problems that are of fundamental interest. For entropy estimation, we obtain an analytic formula for the confidence interval, solving a problem that has been opened for more than 30 years. For approximation of probability distributions, we find the minimum entropy difference between two distributions in terms of their alphabet sizes and the variational distance between them. In particular, we show that the entropy difference between two distributions that are close in variational distance can be arbitrarily large if the alphabet sizes of the two distributions are unconstrained. For random number generation, we characterize the tradeoff between the amount of randomness required and the distortion in terms of variation distance. New tools for non-convex optimization have been developed to establish the results in this paper.

[1]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[2]  Flemming Topsøe,et al.  Basic Concepts, Identities and Inequalities - the Toolkit of Information Theory , 2001, Entropy.

[3]  R. Conant Estimation of entropy of a binary variable: Satisfying a reliability criterion , 1973, Kybernetik.

[4]  I. Olkin,et al.  Inequalities: Theory of Majorization and Its Applications , 1980 .

[5]  Zhengmin Zhang,et al.  Estimating Mutual Information Via Kolmogorov Distance , 2007, IEEE Transactions on Information Theory.

[6]  Liam Paninski,et al.  Estimation of Entropy and Mutual Information , 2003, Neural Computation.

[7]  Jonathon Shlens,et al.  Estimating Entropy Rates with Bayesian Confidence Intervals , 2005, Neural Computation.

[8]  Sergio Verdú,et al.  Approximation theory of output statistics , 1993, IEEE Trans. Inf. Theory.

[9]  V. Erokhin $\varepsilon $-Entropy of a Discrete Random Variable , 1958 .

[10]  Raymond W. Yeung,et al.  The Interplay Between Entropy and Variational Distance , 2010, IEEE Trans. Inf. Theory.

[11]  Imre Csiszár,et al.  Information Theory - Coding Theorems for Discrete Memoryless Systems, Second Edition , 2011 .

[12]  Raymond W. Yeung,et al.  On Information Divergence Measures and a Unified Typicality , 2006, IEEE Transactions on Information Theory.

[13]  Sanjeev R. Kulkarni,et al.  Universal entropy estimation via block sorting , 2004, IEEE Transactions on Information Theory.

[14]  E. Ordentlich,et al.  Inequalities for the L1 Deviation of the Empirical Distribution , 2003 .

[15]  Alex J. Grant,et al.  The confidence interval of entropy estimation through a noisy channel , 2010, 2010 IEEE Information Theory Workshop.

[16]  Sergio Verdú,et al.  Simulation of random processes and rate-distortion theory , 1996, IEEE Trans. Inf. Theory.

[17]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[18]  Raymond W. Yeung,et al.  On the Discontinuity of the Shannon Information Measures , 2005, IEEE Transactions on Information Theory.

[19]  A. Antos,et al.  Convergence properties of functional estimates for discrete distributions , 2001 .

[20]  Raymond W. Yeung,et al.  Information Theory and Network Coding , 2008 .

[21]  G. Basharin On a Statistical Estimate for the Entropy of a Sequence of Independent Random Variables , 1959 .

[22]  Tsachy Weissman,et al.  Universal discrete denoising: known channel , 2003, IEEE Transactions on Information Theory.

[23]  J. Naudts CONTINUITY OF A CLASS OF ENTROPIES AND RELATIVE ENTROPIES , 2002, math-ph/0208038.