A Reverse Jensen Inequality Result with Application to Mutual Information Estimation

The Jensen inequality is a widely used tool in a multitude of fields, such as for example information theory and machine learning. It can be also used to derive other standard inequalities such as the inequality of arithmetic and geometric means or the Hölder inequality. In a probabilistic setting, the Jensen inequality describes the relationship between a convex function and the expected value. In this work, we want to look at the probabilistic setting from the reverse direction of the inequality. We show that under minimal constraints and with a proper scaling, the Jensen inequality can be reversed. We believe that the resulting tool can be helpful for many applications and provide a variational estimation of mutual information, where the reverse inequality leads to a new estimator with superior training behavior compared to current estimators.

[1]  Gerhard Wunder,et al.  Deep learning based wiretap coding via mutual information estimation , 2020, WiseML@WiSec.

[2]  Alex Pentland,et al.  On Reversing Jensen's Inequality , 2000, NIPS.

[3]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[4]  Academia Română. Filiale Cluj-Napoca Revue d'analyse numérique et de théorie de l'approximation , 1992 .

[5]  Aäron van den Oord,et al.  On variational lower bounds of mutual information , 2018 .

[6]  Pramod Viswanath,et al.  Demystifying fixed k-nearest neighbor information estimators , 2016, 2017 IEEE International Symposium on Information Theory (ISIT).

[7]  Aaron C. Courville,et al.  MINE: Mutual Information Neural Estimation , 2018, ArXiv.

[8]  Stefano Ermon,et al.  Understanding the Limitations of Variational Mutual Information Estimators , 2020, ICLR.

[9]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[10]  S. Dragomir SOME REVERSES OF THE JENSEN INEQUALITY WITH APPLICATIONS , 2013, Bulletin of the Australian Mathematical Society.

[11]  Takafumi Kanamori,et al.  Approximating Mutual Information by Maximum Likelihood Density Ratio Estimation , 2008, FSDM.

[12]  Gerhard Wunder,et al.  Deep Learning for Channel Coding via Neural Mutual Information Estimation , 2019, 2019 IEEE 20th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC).

[13]  Joonhyuk Kang,et al.  Generalized subband-filtered and pulse-shaped multicarrier for quasi-synchronous uplink access , 2017, 2017 IEEE 28th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC).

[14]  Sebastian Nowozin,et al.  f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.

[15]  Martin J. Wainwright,et al.  Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization , 2008, IEEE Transactions on Information Theory.

[16]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[17]  Karl Stratos,et al.  Formal Limitations on the Measurement of Mutual Information , 2018, AISTATS.

[18]  Y. Chu,et al.  Converses of the Jensen inequality derived from the Green functions with applications in information theory , 2019, Mathematical Methods in the Applied Sciences.

[19]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20]  Aram Galstyan,et al.  Efficient Estimation of Mutual Information for Strongly Dependent Variables , 2014, AISTATS.

[21]  Sever S Dragomir,et al.  Further reverse results for Jensen's discrete inequality and applications in information theory. , 2000 .

[22]  Igor Vajda,et al.  Estimation of the Information by an Adaptive Partitioning of the Observation Space , 1999, IEEE Trans. Inf. Theory.

[23]  Gerhard Wunder,et al.  Nearly Doubling the Throughput of Multiuser MIMO Systems Using Codebook Tailored Limited Feedback Protocol , 2012, IEEE Transactions on Wireless Communications.

[24]  Fraser,et al.  Independent coordinates for strange attractors from mutual information. , 1986, Physical review. A, General physics.

[25]  Gerhard Wunder,et al.  Neural Mutual Information Estimation for Channel Coding: State-of-the-Art Estimators, Analysis, and Performance Comparison , 2020, 2020 IEEE 21st International Workshop on Signal Processing Advances in Wireless Communications (SPAWC).

[26]  S. Dragomir,et al.  Some converse of Jensen's inequality and applications , 1994 .

[27]  Yoshua Bengio,et al.  Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[28]  Jiaming Song,et al.  HYBRID MUTUAL INFORMATION LOWER-BOUND ESTIMATORS FOR REPRESENTATION LEARNING , 2021 .

[29]  David Barber,et al.  The IM algorithm: a variational approach to Information Maximization , 2003, NIPS 2003.