Potential conditional mutual information: Estimators and properties

The conditional mutual information I(X;Y |Z) measures the average information that X and Y contain about each other given Z. This is an important primitive in many learning problems including conditional independence testing, graphical model inference, causal strength estimation and time-series problems. In several applications, it is desirable to have a functional purely of the conditional distribution pY |X,Z rather than of the joint distribution pX,Y,Z . We define the potential conditional mutual information as the conditional mutual information calculated with a modified joint distribution pY |X,ZqX,Z , where qX,Z is a potential distribution, fixed airport. We develop K nearest neighbor based estimators for this functional, employing importance sampling, and a coupling trick, and prove the finite k consistency of such an estimator. We demonstrate that the estimator has excellent practical performance and show an application in dynamical system inference.

[1]  Alfred O. Hero,et al.  Ensemble estimation of mutual information , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[2]  A. Dawid Conditional Independence in Statistical Theory , 1979 .

[3]  M. C. Jones,et al.  A reliable data-based bandwidth selection method for kernel density estimation , 1991 .

[4]  Yihong Wu,et al.  Strong data-processing inequalities for channels and Bayesian networks , 2015, 1508.06025.

[5]  Xiaojie Qiu,et al.  From Understanding the Development Landscape of the Canonical Fate-Switch Pair to Constructing a Dynamic Landscape for Two-Step Neural Differentiation , 2012, PloS one.

[6]  Venkat Anantharam,et al.  On Maximal Correlation, Hypercontractivity, and the Data Processing Inequality studied by Erkip and Cover , 2013, ArXiv.

[7]  Chung Chan,et al.  Multivariate Mutual Information Inspired by Secret-Key Agreement , 2015, Proceedings of the IEEE.

[8]  Bernhard Schölkopf,et al.  Causal Inference by Identification of Vector Autoregressive Processes with Hidden Components , 2015, ICML.

[9]  Todd P. Coleman,et al.  Directed Information Graphs , 2012, IEEE Transactions on Information Theory.

[10]  Luc Devroye,et al.  The Consistency of Automatic Kernel Density Estimates , 1984 .

[11]  S. Frenzel,et al.  Partial mutual information for coupling analysis of multivariate time series. , 2007, Physical review letters.

[12]  Moritz Grosse-Wentrup,et al.  Quantifying causal influences , 2012, 1203.6502.

[13]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[14]  S. Saigal,et al.  Relative performance of mutual information estimation methods for quantifying the dependence among short and noisy data. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  C. Granger Investigating causal relations by econometric models and cross-spectral methods , 1969 .

[16]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[17]  Sreeram Kannan,et al.  Estimating Mutual Information for Discrete-Continuous Mixtures , 2017, NIPS.

[18]  Radha Poovendran,et al.  Learning Temporal Dependence from Time-Series Data with Latent Variables , 2016, 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[19]  C. Glymour,et al.  STATISTICS AND CAUSAL INFERENCE , 1985 .

[20]  Sreeram Kannan,et al.  Discovering Potential Correlations via Hypercontractivity , 2017, NIPS.

[21]  Chloe Chen Chen Graphical modelling of multivariate time series , 2011 .

[22]  Sreeram Kannan,et al.  On Shannon capacity and causal estimation , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[23]  Bernhard Schölkopf,et al.  Distinguishing Cause from Effect Using Observational Data: Methods and Benchmarks , 2014, J. Mach. Learn. Res..

[24]  Sreeram Kannan,et al.  Network inference using directed information: The deterministic limit , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[25]  Sreeram Kannan,et al.  Causal Strength via Shannon Capacity: Axioms, Estimators and Applications , 2016, ArXiv.

[26]  Dane Taylor,et al.  Causal Network Inference by Optimal Causation Entropy , 2014, SIAM J. Appl. Dyn. Syst..

[27]  Yihong Wu,et al.  Minimax Rates of Entropy Estimation on Large Alphabets via Best Polynomial Approximation , 2014, IEEE Transactions on Information Theory.

[28]  Michael Satosi Watanabe,et al.  Information Theoretical Analysis of Multivariate Correlation , 1960, IBM J. Res. Dev..

[29]  Sreeram Kannan,et al.  Conditional Dependence via Shannon Capacity: Axioms, Estimators and Applications , 2016, ICML.

[30]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[31]  Gregory Valiant,et al.  Estimating the unseen: an n/log(n)-sample estimator for entropy and support size, shown optimal via new CLTs , 2011, STOC '11.

[32]  Yanjun Han,et al.  Minimax Estimation of Functionals of Discrete Distributions , 2014, IEEE Transactions on Information Theory.