ITENE: Intrinsic Transfer Entropy Neural Estimator

Quantifying the directionality of information flow is instrumental in understanding, and possibly controlling, the operation of many complex systems, such as transportation, social, neural, or gene-regulatory networks. The standard Transfer Entropy (TE) metric follows Granger's causality principle by measuring the Mutual Information (MI) between the past states of a source signal $X$ and the future state of a target signal $Y$ while conditioning on past states of $Y$. Hence, the TE quantifies the improvement, as measured by the log-loss, in the prediction of the target sequence $Y$ that can be accrued when, in addition to the past of $Y$, one also has available past samples from $X$. However, by conditioning on the past of $Y$, the TE also measures information that can be synergistically extracted by observing both the past of $X$ and $Y$, and not solely the past of $X$. Building on a private key agreement formulation, the Intrinsic TE (ITE) aims to discount such synergistic information to quantify the degree to which $X$ is \emph{individually} predictive of $Y$, independent of $Y$'s past. In this paper, an estimator of the ITE is proposed that is inspired by the recently proposed Mutual Information Neural Estimation (MINE). The estimator is based on variational bound on the KL divergence, two-sample neural network classifiers, and the pathwise estimator of Monte Carlo gradients.

[1]  David R. Wolf,et al.  Estimating functions of probability distributions from a finite set of samples. , 1994, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[2]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[3]  J. Massey CAUSALITY, FEEDBACK AND DIRECTED INFORMATION , 1990 .

[4]  Viola Priesemann,et al.  TRENTOOL: A Matlab open source toolbox to analyse information flow in time series data with transfer entropy , 2011, BMC Neuroscience.

[5]  Ying Liu,et al.  The relationship between transfer entropy and directed information , 2012, 2012 IEEE Statistical Signal Processing Workshop (SSP).

[6]  S. Varadhan,et al.  Asymptotic evaluation of certain Markov process expectations for large time , 1975 .

[7]  Takafumi Kanamori,et al.  Approximating Mutual Information by Maximum Likelihood Density Ratio Estimation , 2008, FSDM.

[8]  D. Freedman,et al.  On the histogram as a density estimator:L2 theory , 1981 .

[9]  Haim H. Permuter,et al.  Interpretations of Directed Information in Portfolio Theory, Data Compression, and Hypothesis Testing , 2009, IEEE Transactions on Information Theory.

[10]  Viola Priesemann,et al.  Measuring Information-Transfer Delays , 2013, PloS one.

[11]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  Todd P. Coleman,et al.  Estimating the directed information to infer causal relationships in ensemble neural spike train recordings , 2010, Journal of Computational Neuroscience.

[13]  Yoshua Bengio,et al.  Mutual Information Neural Estimation , 2018, ICML.

[14]  Osvaldo Simeone,et al.  A Brief Introduction to Machine Learning for Engineers , 2017, Found. Trends Signal Process..

[15]  S. Frenzel,et al.  Partial mutual information for coupling analysis of multivariate time series. , 2007, Physical review letters.

[16]  Himanshu Asnani,et al.  CCMI : Classifier based Conditional Mutual Information Estimation , 2019, UAI.

[17]  Joseph T. Lizier,et al.  JIDT: An Information-Theoretic Toolkit for Studying the Dynamics of Complex Systems , 2014, Front. Robot. AI.

[18]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[19]  Michael Figurnov,et al.  Monte Carlo Gradient Estimation in Machine Learning , 2019, J. Mach. Learn. Res..

[20]  Stefano Ermon,et al.  Understanding the Limitations of Variational Mutual Information Estimators , 2020, ICLR.

[21]  Haim H. Permuter,et al.  Universal Estimation of Directed Information , 2010, IEEE Transactions on Information Theory.

[22]  Ueli Maurer,et al.  Unconditionally Secure Key Agreement and the Intrinsic Conditional Information , 1999, IEEE Trans. Inf. Theory.

[23]  Schreiber,et al.  Measuring information transfer , 2000, Physical review letters.

[24]  J. Crutchfield,et al.  Regularities unseen, randomness observed: levels of entropy convergence. , 2001, Chaos.

[25]  Nitin Tandon,et al.  Identifying Seizure Onset Zone From the Causal Connectivity Inferred Using Directed Information , 2015, IEEE Journal of Selected Topics in Signal Processing.

[26]  Gordon Pipa,et al.  Transfer entropy—a model-free measure of effective connectivity for the neurosciences , 2010, Journal of Computational Neuroscience.