Neural Estimators for Conditional Mutual Information Using Nearest Neighbors Sampling

The estimation of mutual information (MI) or conditional mutual information (CMI) from a set of samples is a long-standing problem. A recent line of work in this area has leveraged the approximation power of artificial neural networks and has shown improvements over conventional methods. One important challenge in this new approach is the need to obtain, given the original dataset, a different set where the samples are distributed according to a specific product density function. This is particularly challenging when estimating CMI. In this paper, we introduce a new technique, based on <inline-formula><tex-math notation="LaTeX">$k$</tex-math></inline-formula> nearest neighbors (<inline-formula><tex-math notation="LaTeX">$k$</tex-math></inline-formula>-NN), to perform the resampling and derive high-confidence concentration bounds for the sample average. Then the technique is employed to train a neural network classifier and the CMI is estimated accordingly. We propose three estimators using this technique and prove their consistency, make a comparison between them and similar approaches in the literature, and experimentally show improvements in estimating the CMI in terms of accuracy and variance of the estimators.

[1]  Alexander A. Alemi,et al.  On Variational Bounds of Mutual Information , 2019, ICML.

[2]  Sanjeev R. Kulkarni,et al.  Universal Estimation of Information Measures for Analog Sources , 2009, Found. Trends Commun. Inf. Theory.

[3]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[4]  Aaron C. Courville,et al.  MINE: Mutual Information Neural Estimation , 2018, ArXiv.

[5]  Karl Henrik Johansson,et al.  Causality Graph of Vehicular Traffic Flow , 2020, ArXiv.

[6]  S. Varadhan,et al.  Asymptotic evaluation of certain Markov process expectations for large time , 1975 .

[7]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[8]  Himanshu Asnani,et al.  CCMI : Classifier based Conditional Mutual Information Estimation , 2019, UAI.

[9]  Aäron van den Oord,et al.  On variational lower bounds of mutual information , 2018 .

[10]  Sayandev Mukherjee Machine Learning using the Variational Predictive Information Bottleneck with a Validation Set , 2019, ArXiv.

[11]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  Mikael Skoglund,et al.  Testing for directed information graphs , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[13]  Osvaldo Simeone,et al.  ITENE: Intrinsic Transfer Entropy Neural Estimator , 2019, ArXiv.

[14]  F. Fleuret Fast Binary Feature Selection with Conditional Mutual Information , 2004, J. Mach. Learn. Res..

[15]  Martin J. Wainwright,et al.  Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization , 2008, IEEE Transactions on Information Theory.

[16]  Timothy J. O'Shea,et al.  Physical Layer Communications System Design Over-the-Air Using Adversarial Networks , 2018, 2018 26th European Signal Processing Conference (EUSIPCO).

[17]  Stephan ten Brink,et al.  Deep Learning Based Communication Over the Air , 2017, IEEE Journal of Selected Topics in Signal Processing.

[18]  Abbas El Gamal,et al.  Network Information Theory , 2021, 2021 IEEE 3rd International Conference on Advanced Trends in Information Theory (ATIT).

[19]  P. Piantanida,et al.  On the Estimation of Information Measures of Continuous Distributions , 2020, ArXiv.

[20]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[21]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[22]  J. Massey CAUSALITY, FEEDBACK AND DIRECTED INFORMATION , 1990 .

[23]  Mikael Skoglund,et al.  A Variational Approach to Privacy and Fairness , 2021, 2021 IEEE Information Theory Workshop (ITW).

[24]  Jakob Runge,et al.  Conditional independence testing based on a nearest-neighbor estimator of conditional mutual information , 2017, AISTATS.

[25]  M. Paluš,et al.  Inferring the directionality of coupling with conditional mutual information. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[26]  Pramod Viswanath,et al.  Demystifying fixed k-nearest neighbor information estimators , 2016, 2017 IEEE International Symposium on Information Theory (ISIT).

[27]  G. Lugosi,et al.  On the Strong Universal Consistency of Nearest Neighbor Regression Function Estimates , 1994 .

[28]  Dongwoo Kim,et al.  Rethinking Softmax with Cross-Entropy: Neural Network Classifier as Mutual Information Estimator , 2019, ArXiv.

[29]  S. Frenzel,et al.  Partial mutual information for coupling analysis of multivariate time series. , 2007, Physical review letters.

[30]  Karl Stratos,et al.  Formal Limitations on the Measurement of Mutual Information , 2018, AISTATS.

[31]  Jakob Hoydis,et al.  An Introduction to Deep Learning for the Physical Layer , 2017, IEEE Transactions on Cognitive Communications and Networking.

[32]  Gerhard Wunder,et al.  Deep Learning for Channel Coding via Neural Mutual Information Estimation , 2019, 2019 IEEE 20th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC).

[33]  Stefano Ermon,et al.  Understanding the Limitations of Variational Mutual Information Estimators , 2020, ICLR.

[34]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[35]  Paul Suetens,et al.  Nonrigid Image Registration Using Conditional Mutual Information , 2010, IEEE Transactions on Medical Imaging.