Bounds on the Entropy of a Function of a Random Variable and Their Applications

It is well known that the entropy <inline-formula> <tex-math notation="LaTeX">$H(X)$ </tex-math></inline-formula> of a discrete random variable <inline-formula> <tex-math notation="LaTeX">$X$ </tex-math></inline-formula> is always greater than or equal to the entropy <inline-formula> <tex-math notation="LaTeX">$H(f(X))$ </tex-math></inline-formula> of a function <inline-formula> <tex-math notation="LaTeX">$f$ </tex-math></inline-formula> of <inline-formula> <tex-math notation="LaTeX">$X$ </tex-math></inline-formula>, with equality if and only if <inline-formula> <tex-math notation="LaTeX">$f$ </tex-math></inline-formula> is one-to-one. In this paper, we give tight bounds on <inline-formula> <tex-math notation="LaTeX">$H(f(X))$ </tex-math></inline-formula>, when the function <inline-formula> <tex-math notation="LaTeX">$f$ </tex-math></inline-formula> is not one-to-one, and we illustrate a few scenarios, where this matters. As an intermediate step toward our main result, we derive a lower bound on the entropy of a probability distribution, when only a bound on the ratio between the maximal and minimal probabilities is known. The lower bound improves on previous results in the literature, and it could find applications outside the present scenario.

[1]  Bernhard C. Geiger,et al.  Hard Clusters Maximize Mutual Information , 2016, ArXiv.

[2]  I. Olkin,et al.  Inequalities: Theory of Majorization and Its Applications , 1980 .

[3]  Richard Cole,et al.  Fast Algorithms for Constructing Maximum Entropy Summary Trees , 2014, ICALP.

[4]  Naftali Tishby,et al.  Agglomerative Information Bottleneck , 1999, NIPS.

[5]  Raymond W. Yeung,et al.  The Interplay Between Entropy and Variational Distance , 2007, IEEE Transactions on Information Theory.

[6]  Herbert Wiklicky,et al.  Formal Methods for Quantitative Aspects of Programming Languages, 10th International School on Formal Methods for the Design of Computer, Communication and Software Systems, SFM 2010, Bertinoro, Italy, June 21-26, 2010, Advanced Lectures , 2010, SFM.

[7]  Tom Leinster,et al.  A Characterization of Entropy in Terms of Information Loss , 2011, Entropy.

[8]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[9]  Gian-Carlo Rota,et al.  Twelve problems in probability no one likes to bring up , 2001 .

[10]  Joseph P. S. Kung,et al.  Combinatorics - The Rota Way , 2009 .

[11]  Slavko Simic Jensen's inequality and new entropy bounds , 2009, Appl. Math. Lett..

[12]  Eli Upfal,et al.  Probability and Computing: Randomized Algorithms and Probabilistic Analysis , 2005 .

[13]  Igal Sason,et al.  Entropy Bounds for Discrete Random Variables via Maximal Coupling , 2012, IEEE Transactions on Information Theory.

[14]  Guy N. Rothblum,et al.  On Approximating the Entropy of Polynomial Mappings , 2011, ICS.

[15]  Mathukumalli Vidyasagar A Metric Between Probability Distributions on Finite Sets of Different Cardinalities and Applications to Order Reduction , 2012, IEEE Transactions on Automatic Control.

[16]  A. Dimitrov,et al.  Neural coding and decoding: communication channels and quantization , 2001, Network.

[17]  A. Wehrl Remarks on A-entropy☆ , 1977 .

[18]  Jacob Goldberger,et al.  Nonparametric Information Theoretic Clustering Algorithm , 2010, ICML.

[19]  Mladen Kovacevic,et al.  On the entropy of couplings , 2013, Inf. Comput..

[20]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[21]  Claude E. Shannon,et al.  The lattice theory of information , 1953, Trans. IRE Prof. Group Inf. Theory.

[22]  Michelle Effros,et al.  Quantization as Histogram Segmentation: Optimal Scalar Quantizer Design in Network Systems , 2008, IEEE Transactions on Information Theory.

[23]  Luisa Gargano,et al.  Information theoretic measures of distances and their econometric applications , 2013, 2013 IEEE International Symposium on Information Theory.

[24]  Eric Allender,et al.  Complexity Theory , 1997, Encyclopedia of Cryptography and Security.

[25]  Yishay Mansour,et al.  An Information-Theoretic Analysis of Hard and Soft Assignment Methods for Clustering , 1997, UAI.

[26]  Christian Hennig,et al.  Recovering the number of clusters in data sets with noise features using feature rescaling factors , 2015, Inf. Sci..

[27]  Frederick Jelinek,et al.  On variable-length-to-block coding , 1972, IEEE Trans. Inf. Theory.

[28]  Ugo Vaccaro,et al.  Bounding the average length of optimal source codes via majorization theory , 2004, IEEE Transactions on Information Theory.

[29]  Ugo Vaccaro,et al.  Supermodularity and subadditivity properties of the entropy on the majorization lattice , 2002, IEEE Trans. Inf. Theory.

[30]  Imre Csiszár,et al.  Information Theory and Statistics: A Tutorial , 2004, Found. Trends Commun. Inf. Theory.

[31]  Ido Tal,et al.  Greedy-merge degrading has optimal power-law , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[32]  Luisa Gargano,et al.  A Note on Approximation of Uniform Distributions From Variable-to-Fixed Length Codes , 2006, IEEE Transactions on Information Theory.

[33]  Bobak Nazer,et al.  Information-distilling quantizers , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[34]  Guojun Gan,et al.  Data Clustering: Theory, Algorithms, and Applications (ASA-SIAM Series on Statistics and Applied Probability) , 2007 .

[35]  David J. Schwab,et al.  The Deterministic Information Bottleneck , 2015, Neural Computation.

[36]  Luisa Gargano,et al.  Approximating probability distributions with short vectors, via information theoretic distance measures , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[37]  Jean-Marc Vincent,et al.  The Best-Partitions Problem: How to Build Meaningful Aggregations , 2013, 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[38]  Brian M. Kurkoski,et al.  Quantization of Binary-Input Discrete Memoryless Channels , 2011, IEEE Transactions on Information Theory.

[39]  Pasquale Malacaria,et al.  Information Theory and Security: Quantitative Information Flow , 2010, SFM.

[40]  Luisa Gargano,et al.  H(X) vs. H(f (X)) , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[41]  A. Dimitrov,et al.  Analysis of neural coding through quantization with an information-based distortion measure , 2003, Network.

[42]  Sergio Verdú,et al.  On the Interplay Between Conditional Entropy and Error Probability , 2010, IEEE Transactions on Information Theory.

[43]  Michael A. Nielsen,et al.  Majorization and the interconversion of bipartite states , 2001, Quantum Inf. Comput..